94
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Optimized Schwarz method without overlap for the gravitational potential equation on cluster of graphics processing unit

, &
Pages 955-980 | Received 27 May 2014, Accepted 01 Jan 2015, Published online: 24 Mar 2015

References

  • J.I. Aliaga, M. Bollhofer, A.F. Martien, and E.S. Quintana-Orti, Parallelization of multilevel ILU preconditioners on distributed-memory multiprocessors, Proceedings of the 10th International Conference PARA, Reykjavík, Iceland, June 6–9, Vol. 7133, Revised Selected Papers, Part I, Lecture Notes in Computer Science, Springer, Berlin and Heidelberg, 2010, pp. 162–172.
  • H. Anzt, V. Heuveline, and B. Rocker, Mixed precision iterative refinement methods for linear systems: Convergence analysis based on Krylov subspace methods, Proceedings of the 10th International Conference PARA, Reykjavík, Iceland, June 6–9, Vol. 7134, Revised Selected Papers, Part II, Lecture Notes in Computer Science, Springer, Berlin and Heidelberg, 2010, pp. 237–247.
  • J.P. Arun, M. Mishra, and S.V. Subramaniam, Parallel implementation of MOPSO on GPU using Open CL and CUDA, Proceedings of the 2011 18th International Conference on High Performance Computing, Washington, DC, USA, 2011, pp. 1–10.
  • A. Auger and N. Hansen, Tutorial CMA-ES: Evolution strategies and covariance matrix adaptation, Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO'12), New York, NY, USA, 2012, pp. 827–848.
  • J.M. Bahi, R. Couturier, and L.Z. Khodja, Parallel GMRES implementation for solving sparse linear systems on GPU clusters, in Proceedings of the 19th High Performance Computing Symposia, Boston, MA, Society for Computer Simulation International, San Diego, CA, 2011, pp. 12–19.
  • A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator, IEEE International symposium on performance analysis of systems and software, Boston, MA, USA, April 26–28, 2009, pp. 163–174.
  • N. Bell and M. Garland, Efficient sparse matrix–vector multiplication on CUDA, Nvidia Technical Report NVR-2008-004, Nvidia Corporation, 2008. Available at http://www.nvidia.com/object/nvidia_research_pub_001.html (Accessed March 7, 2015).
  • N. Bell and M. Garland, Implementing sparse matrix–vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC'09), Portland, OR. ACM, New York, 2009, pp. 1–11.
  • J.-D. Benamou and B. Després, A domain decomposition method for the Helmholtz equation and related optimal control problems, J. Comput. Phys. 136 (1997), pp. 68–82. doi: 10.1006/jcph.1997.5742
  • J. Bolz, I. Farmer, E. Grinspun, and P. Schröoder, Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid, ACM SIGGRAPH 2003 Papers, New York, 2003, pp. 917–924.
  • A.F. Camargos, V.C. Silva, J.M. Guichon, and G. Meunier, Iterative solution on GPU of linear systems arising from the A-V edge-FEA of time-harmonic electromagnetic phenomena, Proceedings of the 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Washington, DC, USA, IEEE Computer Society, 2014, pp. 365–371.
  • A.K. Cheik Ahamed and F. Magoulès, Fast sparse matrix–vector multiplication on graphics processing unit for finite element analysis, 2012 IEEE 14th International Conference on High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), IEEE Computer Society, 2012, pp. 1307–1314.
  • A.K. Cheik Ahamed and F. Magoulès, Iterative methods for sparse linear systems on graphics processing unit, High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, 25–27 June, Liverpool, UK, IEEE Computer Society, 2012, pp. 836–842.
  • A.K. Cheik Ahamed and F. Magoulès, Schwarz method with two-sided transmission conditions for the gravity equations on graphics processing unit, Proceedings of the 12th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES), Kingston, September 2–4, London, UK, IEEE Computer Society, 2013, pp. 105–109.
  • A.K. Cheik Ahamed and F. Magoulès, Iterative Krylov methods for gravity problems on graphics processing unit, Proceedings of the 12th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES), September 2–4, Kingston, London, UK, IEEE Computer Society, 2013, pp. 16–20.
  • P. Chevalier and F. Nataf, Symmetrized method with optimized second-order conditions for the Helmholtz equation, Domain Decomposition Methods, 10 (Boulder, CO, 1997), Amercian Mathematical Society, Providence, RI, 1998, pp. 400–407.
  • A. Davidson, Y. Zhang, and J.D. Owens, An auto-tuned method for solving large tridiagonal systems on the GPU, Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, Washington, DC, USA, IEEE Computer Society, 2011, pp. 956–965.
  • A. de La Bourdonnaye, C. Farhat, A. Macedo, F. Magoulès, and F.X. Roux, A non overlapping domain decomposition method for the exterior Helmholtz problem, Contemp. Math. 218 (1998), pp. 42–66. doi: 10.1090/conm/218/03001
  • B. Després, Domain decomposition method and the Helmholtz problem. II, Second International Conference on Mathematical and Numerical Aspects of Wave Propagation (Newark, DE, 1993), SIAM, Philadelphia, PA, 1993, pp. 197–206.
  • B. Després, P. Joly, and J.E. Roberts, A domain decomposition method for the harmonic Maxwell equations, in Iterative Methods in Linear Algebra (Brussels, 1991), R. Beauwens and P. de Groen, eds., Elsevier Science Publishers B. V., North-Holland, Amsterdam, 1992, pp. 475–484.
  • P. Du, R. Weber, P. Luszczek, S. Tomov, G. Peterson, and J. Dongarra, From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming, Parallel Comput. 38(8) (2012), pp. 391–407. doi: 10.1016/j.parco.2011.10.002
  • P. Du, P. Luszczek, and J. Dongarra, OpenCL Evaluation for Numerical Linear Algebra Library Development, Symposium Application Accelerators in High Performance Computing (SAAHPC10), 2010.
  • T. Dufaud and D. Tromeur-Dervout, Efficient parallel implementation of the fully algebraic multiplicative Aitken-RAS preconditioning technique, Adv. Eng. Softw. 53 (2012), pp. 33–44. doi: 10.1016/j.advengsoft.2012.07.005
  • IEEE 754: Standard for Binary Floating-Point Arithmetic, 2008. Available at http://grouper.ieee.org/groups/754 (Accessed September 24, 2014).
  • M.J. Gander, L. Halpern, and F. Magoulès, An optimized Schwarz method with two-sided Robin transmission conditions for the Helmholtz equation, Int. J. Numer. Methods Fluids 55 (2007), pp. 163–175. doi: 10.1002/fld.1433
  • M. Gander, L. Halpern, F. Magoulès, and F.X. Roux, Analysis of patch substructuring methods, Int. J. Appl. Math. Comput. Sci. 17 (2007), pp. 395–402. doi: 10.2478/v10006-007-0032-1
  • M.J. Gander, F. Magoulès, and F. Nataf, Optimized Schwarz methods without overlap for the Helmholtz equation, SIAM 24 (2002), pp. 38–60.
  • M. Garbey and D. Tromeur-Dervout, On some Aitken-like acceleration of the Schwarz method, Int. J. Numer. Methods Fluids 40(2) (2002), pp. 1493–1513. doi: 10.1002/fld.407
  • S. Ghanemi, A domain decomposition method for Helmholtz scattering problems, Ninth International Conference on Domain Decomposition Methods, ddm.org, 1997, pp. 105–112.
  • T.D. Han and T.S. Abdelrahman, hicuda: High-level GPGPU programming, IEEE Trans. Parallel Distrib. Syst. 22 (2011), pp. 78–90. doi: 10.1109/TPDS.2010.62
  • C. Janna, M. Ferronato, and G. Gambolati, A block FSAI-ILU parallel preconditioner for symmetric positive definite linear systems, SIAM J. Sci. Comput. 32 (2010), pp. 2468–2484. doi: 10.1137/090779760
  • C. Japhet and F. Nataf, The best interface conditions for domain decomposition methods: Absorbing boundary conditions, in Absorbing Boundaries and Layers, Domain Decomposition Methods. Applications to Large Scale Computations, L. Tourrette and L. Halpern, eds., Nova Science Publishers Inc., New York, 2001, pp. 348–373.
  • Khronos Group, The OpenCL Specification, 2010, Available at http://www.khronos.org (Accessed September 24, 2014).
  • J. Krüger and R. Westermann, Linear algebra operators for GPU implementation of numerical algorithms, ACM Trans. Graph. 22 (2003), pp. 908–916. doi: 10.1145/882262.882363
  • J. Kruis, Domain Decomposition Methods for Distributed Computing, Saxe-Coburg Publications, Stirling, Scotland, 2007.
  • R. Li and Y. Saad, GPU-accelerated preconditioned iterative linear solvers, J. Supercomput. 63(2) (2013), pp. 443–466.
  • P.L. Lions, On the Schwarz alternating method. I, First International Symposium on Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, PA, 1988, pp. 1–42.
  • P.L. Lions, On the Schwarz alternating method. II, in Domain Decomposition Methods, T.F. Chan, R. Glowinski, J. Périaux, and O. Widlund, eds., SIAM, Philadelphia, PA, 1989, pp. 47–70.
  • P.L. Lions, On the Schwarz alternating method. III: A variant for nonoverlapping subdomains, Third International Symposium on Domain Decomposition Methods for Partial Differential Equations, Houston, TX, March 20–22, 1989, SIAM, Philadelphia, PA, 1990, pp. 202–223.
  • Y. Maday and F. Magoulès, Non-overlapping additive Schwarz methods tuned to highly heterogeneous media, Comptes Rendus à l'Académie des Sci. 341 (2005), pp. 701–705.
  • Y. Maday and F. Magoulès, Absorbing interface conditions for domain decomposition methods: A general presentation, Comput. Methods Appl. Mech. Eng. 195 (2006), pp. 3880–3900. doi: 10.1016/j.cma.2005.01.025
  • Y. Maday and F. Magoulès, Improved ad hoc interface conditions for Schwarz solution procedure tuned to highly heterogeneous media, Appl. Math. Model. 30 (2006), pp. 731–743. doi: 10.1016/j.apm.2005.05.020
  • Y. Maday and F. Magoulès, Optimized Schwarz methods without overlap for highly heterogeneous media, Comput. Methods Appl. Mech. Eng. 196 (2007), pp. 1541–1553. doi: 10.1016/j.cma.2005.05.059
  • F. Magoulés, A.-K. Cheik Ahamed, and R. Putanowicz, Auto-tuned Krylov methods on cluster of graphics processing unit, Int. J. Comput. Math. 92(6) (2015), pp. 1222–1250.
  • F. Magoulès, P. Iványi, and B.H.V. Topping, Convergence analysis of Schwarz methods without overlap for the Helmholtz equation, Comput. Struct. 82 (2004), pp. 1835–1847. doi: 10.1016/j.compstruc.2004.02.025
  • F. Magoulès and F.-X. Roux, Lagrangian formulation of domain decomposition methods: A unified theory, Appl. Math. Model. 30 (2006), pp. 593–615. doi: 10.1016/j.apm.2005.06.016
  • F. Magoulès, F.-X. Roux, and L. Series, Algebraic way to derive absorbing boundary conditions for the Helmholtz equation, J. Comput. Acoust. 13 (2005), pp. 433–454. doi: 10.1142/S0218396X05002827
  • F. Magoulès, F.-X. Roux, and L. Series, Algebraic approximation of Dirichlet-to-Neumann maps for the equations of linear elasticity, Comput. Methods Appl. Mech. Eng. 195 (2006), pp. 3742–3759. doi: 10.1016/j.cma.2005.01.022
  • F. Magoulès, F.-X. Roux, and L. Series, Algebraic Dirichlet-to-Neumann mapping for linear elasticity problems with extreme contrasts in the coefficients, Appl. Math. Model. 30 (2006), pp. 702–713. doi: 10.1016/j.apm.2005.07.008
  • F. Magoulès, F.-X. Roux, and L. Series, Algebraic approach to absorbing boundary conditions for the Helmholtz equation, Int. J. Comput. Math. 84 (2007), pp. 231–240. doi: 10.1080/00207160601168605
  • K.K. Matam and K. Kothapalli, Accelerating sparse matrix Vector multiplication in iterative methods using GPU, Proceedings of the 2011 International Conference on Parallel Processing (ICPP '11), Washington, DC, USA, IEEE Computer Society, 2011, pp. 612–621.
  • J. Meredith, D. Bremer, L. Flath, J. Johnson, H. Jones, S. Vaidya, and R. Frank, The GAIA project: Evaluation of GPU-based programming environments for knowledge discovery, Tech. rep., Lawrence Livermore National Labs, Livermore, 2004.
  • F. Nataf, F. Rogier, and E. de Sturler, Optimal Interface Conditions for Domain Decomposition Methods, CMAP (Ecole Polytechnique) 301 (1994), pp. 1–18.
  • S. Noury, S. Boivin, and O. L. Maître, A Fast Poisson Solver for OpenCL using Multigrid Methods, GPU Pro 2, W. Engel, Ed. A.K. Peters, 2011, pp. 445–471. ISBN 978-1-56881-718-7.
  • Nvidia Corporation, CUDA toolkit 4.0, CUBLAS Library, 2011. Available at http://developer.nvidia.com/cuda-toolkit-40 (Accessed September 24, 2014).
  • J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, GPU computing, Proc. IEEE 96 (2008), pp. 879–899. doi: 10.1109/JPROC.2008.917757
  • J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A.E. Lefohn, and T.J. Purcell, A survey of general-purpose computation on graphics hardware, EUROGRAPHICS (2005). Available at http://www.blackwell-synergy.com/doi/pdf/10.1111/j.1467-8659.2007.01012.x (Accessed March 7, 2015).
  • A. Quarteroni and A. Valli, Domain Decomposition Methods for Partial Differential Equations, Oxford University Press, Oxford, 1999.
  • S. Ristov, M. Gusev, L. Djinevski, and S. Arsenovski, Performance impact of reconfigurable L1 cache on GPU devices, Federated Conference on Computer Science and Information Systems (FedCSIS 2013), IEEE Conference Proceedings, Krakow, Poland, September 9–11, 2013, pp. 507–510.
  • F.X. Roux, F. Magoulès, L. Series, and Y. Boubendir, Approximation of optimal interface boundary conditions for two-Lagrange multiplier FETI method, Proceedings of the 15th International Conference on Domain Decomposition Methods, Berlin, Germany, July 21–15, 2003, R. Kornhuber, R. Hoppe, J. Périaux, O. Pironneau, O. Widlund, and J. Xu, eds., Lecture Notes in Computational Science and Engineering, Springer-Verlag, Heidelberg, 2005, pp. 283–290.
  • H. Schwarz, Über einen Grenzübergang durch alternierendes Verfahren, Vierteljahrsschrift der Naturforschenden Gesellschaft in Zürich 15 (1870), pp. 272–286.
  • B. Smith, P. Bjorstad, and W. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, New York, NY, 1996.
  • J. Sulaiman, M. Othman, and M.K. Hasan, Nine Point-EDGSOR Iterative method for the finite element solution of 2D Poisson equations, in International Conference Computational Science and Its Applications (ICCSA 2009), Seoul, Korea, June 29–July 2, Lecture Notes in Computer Science, Vol. 5592, O. Gervasi, D. Taniar, B. Murgante, A. Lagan, Y. Mun, and M. Gavrilova, eds., Springer, Berlin and Heidelberg, 2009, pp. 764–774.
  • J.E. Stone, D. Gohara, and G. Shi, OpenCL ccc A Parallel Programming Standard for Heterogeneous Computing Systems, IEEE Des. Test 12(3) (2010), pp. 66–73.
  • C.J. Thompson, S. Hahn, and M. Oskin, Using modern graphics architectures for general-purpose computing: a framework and analysis, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, MICRO 35, Istanbul, Turkey, IEEE Computer Society Press, Los Alamitos, CA, 2002, pp. 306–317.
  • P. Tillet, K. Rupp, and S. Selberherr, An Automatic OpenCL Compute Kernel Generator for Basic Linear Algebra Operations, Proceedings of the 2012 Symposium on High Performance Computing, Orlando, FL, 2012, pp. 4:1–4:2.
  • A. Toselli and O. Widlund, Domain Decomposition methods: Algorithms and Theory, Springer-Verlag, Berlin, 2005.
  • L. Tourrette and L. Halpern, Absorbing Boundaries and Layers, Domain Decomposition Methods: Applications to Large Scale Computers, Nova Science Publishers, 2001. Available at http://books.google.fr/books?id=KrCsqv6WStwChttp://books.google.fr/books?id=KrCsqv6WStwC.
  • A.H.E. Zein and A.P. Rendell, Generating optimal CUDA sparse matrix–vector product implementations for evolving GPU hardware, Concurrency Comput: Pract. Exp. 24 (2012), pp. 3–13. doi: 10.1002/cpe.1732

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.