Search in:

Advanced search

International Journal of Computer Mathematics Volume 93, 2016 - Issue 6

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

Original Articles

Optimized Schwarz method without overlap for the gravitational potential equation on cluster of graphics processing unit

Frédéric MagoulèsEcole Centrale, Paris, FranceCorrespondence[email protected]
View further author information

Abal-Kassim Cheik AhamedEcole Centrale, Paris, FranceView further author information

Roman PutanowiczInstitute for Computational Civil Engineering (L-5), Cracow University of Technology, Cracow, PolandView further author information

Pages 955-980 | Received 27 May 2014, Accepted 01 Jan 2015, Published online: 24 Mar 2015

Cite this article
https://doi.org/10.1080/00207160.2015.1011628
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

J.I. Aliaga, M. Bollhofer, A.F. Martien, and E.S. Quintana-Orti, Parallelization of multilevel ILU preconditioners on distributed-memory multiprocessors, Proceedings of the 10th International Conference PARA, Reykjavík, Iceland, June 6–9, Vol. 7133, Revised Selected Papers, Part I, Lecture Notes in Computer Science, Springer, Berlin and Heidelberg, 2010, pp. 162–172.
Google Scholar
H. Anzt, V. Heuveline, and B. Rocker, Mixed precision iterative refinement methods for linear systems: Convergence analysis based on Krylov subspace methods, Proceedings of the 10th International Conference PARA, Reykjavík, Iceland, June 6–9, Vol. 7134, Revised Selected Papers, Part II, Lecture Notes in Computer Science, Springer, Berlin and Heidelberg, 2010, pp. 237–247.
Google Scholar
J.P. Arun, M. Mishra, and S.V. Subramaniam, Parallel implementation of MOPSO on GPU using Open CL and CUDA, Proceedings of the 2011 18th International Conference on High Performance Computing, Washington, DC, USA, 2011, pp. 1–10.
Google Scholar
A. Auger and N. Hansen, Tutorial CMA-ES: Evolution strategies and covariance matrix adaptation, Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO'12), New York, NY, USA, 2012, pp. 827–848.
Google Scholar
J.M. Bahi, R. Couturier, and L.Z. Khodja, Parallel GMRES implementation for solving sparse linear systems on GPU clusters, in Proceedings of the 19th High Performance Computing Symposia, Boston, MA, Society for Computer Simulation International, San Diego, CA, 2011, pp. 12–19.
Google Scholar
A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator, IEEE International symposium on performance analysis of systems and software, Boston, MA, USA, April 26–28, 2009, pp. 163–174.
Google Scholar
N. Bell and M. Garland, Efficient sparse matrix–vector multiplication on CUDA, Nvidia Technical Report NVR-2008-004, Nvidia Corporation, 2008. Available at http://www.nvidia.com/object/nvidia_research_pub_001.html (Accessed March 7, 2015).
Google Scholar
N. Bell and M. Garland, Implementing sparse matrix–vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC'09), Portland, OR. ACM, New York, 2009, pp. 1–11.
Google Scholar
J.-D. Benamou and B. Després, A domain decomposition method for the Helmholtz equation and related optimal control problems, J. Comput. Phys. 136 (1997), pp. 68–82. doi: 10.1006/jcph.1997.5742
Web of Science ®Google Scholar
J. Bolz, I. Farmer, E. Grinspun, and P. Schröoder, Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid, ACM SIGGRAPH 2003 Papers, New York, 2003, pp. 917–924.
Google Scholar
A.F. Camargos, V.C. Silva, J.M. Guichon, and G. Meunier, Iterative solution on GPU of linear systems arising from the A-V edge-FEA of time-harmonic electromagnetic phenomena, Proceedings of the 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Washington, DC, USA, IEEE Computer Society, 2014, pp. 365–371.
Google Scholar
A.K. Cheik Ahamed and F. Magoulès, Fast sparse matrix–vector multiplication on graphics processing unit for finite element analysis, 2012 IEEE 14th International Conference on High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), IEEE Computer Society, 2012, pp. 1307–1314.
Google Scholar
A.K. Cheik Ahamed and F. Magoulès, Iterative methods for sparse linear systems on graphics processing unit, High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, 25–27 June, Liverpool, UK, IEEE Computer Society, 2012, pp. 836–842.
Google Scholar
A.K. Cheik Ahamed and F. Magoulès, Schwarz method with two-sided transmission conditions for the gravity equations on graphics processing unit, Proceedings of the 12th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES), Kingston, September 2–4, London, UK, IEEE Computer Society, 2013, pp. 105–109.
Google Scholar
A.K. Cheik Ahamed and F. Magoulès, Iterative Krylov methods for gravity problems on graphics processing unit, Proceedings of the 12th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES), September 2–4, Kingston, London, UK, IEEE Computer Society, 2013, pp. 16–20.
Google Scholar
P. Chevalier and F. Nataf, Symmetrized method with optimized second-order conditions for the Helmholtz equation, Domain Decomposition Methods, 10 (Boulder, CO, 1997), Amercian Mathematical Society, Providence, RI, 1998, pp. 400–407.
Google Scholar
A. Davidson, Y. Zhang, and J.D. Owens, An auto-tuned method for solving large tridiagonal systems on the GPU, Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, Washington, DC, USA, IEEE Computer Society, 2011, pp. 956–965.
Google Scholar
A. de La Bourdonnaye, C. Farhat, A. Macedo, F. Magoulès, and F.X. Roux, A non overlapping domain decomposition method for the exterior Helmholtz problem, Contemp. Math. 218 (1998), pp. 42–66. doi: 10.1090/conm/218/03001
Google Scholar
B. Després, Domain decomposition method and the Helmholtz problem. II, Second International Conference on Mathematical and Numerical Aspects of Wave Propagation (Newark, DE, 1993), SIAM, Philadelphia, PA, 1993, pp. 197–206.
Google Scholar
B. Després, P. Joly, and J.E. Roberts, A domain decomposition method for the harmonic Maxwell equations, in Iterative Methods in Linear Algebra (Brussels, 1991), R. Beauwens and P. de Groen, eds., Elsevier Science Publishers B. V., North-Holland, Amsterdam, 1992, pp. 475–484.
Google Scholar
P. Du, R. Weber, P. Luszczek, S. Tomov, G. Peterson, and J. Dongarra, From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming, Parallel Comput. 38(8) (2012), pp. 391–407. doi: 10.1016/j.parco.2011.10.002
Web of Science ®Google Scholar
P. Du, P. Luszczek, and J. Dongarra, OpenCL Evaluation for Numerical Linear Algebra Library Development, Symposium Application Accelerators in High Performance Computing (SAAHPC10), 2010.
Google Scholar
T. Dufaud and D. Tromeur-Dervout, Efficient parallel implementation of the fully algebraic multiplicative Aitken-RAS preconditioning technique, Adv. Eng. Softw. 53 (2012), pp. 33–44. doi: 10.1016/j.advengsoft.2012.07.005
Web of Science ®Google Scholar
IEEE 754: Standard for Binary Floating-Point Arithmetic, 2008. Available at http://grouper.ieee.org/groups/754 (Accessed September 24, 2014).
Google Scholar
M.J. Gander, L. Halpern, and F. Magoulès, An optimized Schwarz method with two-sided Robin transmission conditions for the Helmholtz equation, Int. J. Numer. Methods Fluids 55 (2007), pp. 163–175. doi: 10.1002/fld.1433
Web of Science ®Google Scholar
M. Gander, L. Halpern, F. Magoulès, and F.X. Roux, Analysis of patch substructuring methods, Int. J. Appl. Math. Comput. Sci. 17 (2007), pp. 395–402. doi: 10.2478/v10006-007-0032-1
Web of Science ®Google Scholar
M.J. Gander, F. Magoulès, and F. Nataf, Optimized Schwarz methods without overlap for the Helmholtz equation, SIAM 24 (2002), pp. 38–60.
Google Scholar
M. Garbey and D. Tromeur-Dervout, On some Aitken-like acceleration of the Schwarz method, Int. J. Numer. Methods Fluids 40(2) (2002), pp. 1493–1513. doi: 10.1002/fld.407
Web of Science ®Google Scholar
S. Ghanemi, A domain decomposition method for Helmholtz scattering problems, Ninth International Conference on Domain Decomposition Methods, ddm.org, 1997, pp. 105–112.
Google Scholar
T.D. Han and T.S. Abdelrahman, hicuda: High-level GPGPU programming, IEEE Trans. Parallel Distrib. Syst. 22 (2011), pp. 78–90. doi: 10.1109/TPDS.2010.62
Web of Science ®Google Scholar
C. Janna, M. Ferronato, and G. Gambolati, A block FSAI-ILU parallel preconditioner for symmetric positive definite linear systems, SIAM J. Sci. Comput. 32 (2010), pp. 2468–2484. doi: 10.1137/090779760
Web of Science ®Google Scholar
C. Japhet and F. Nataf, The best interface conditions for domain decomposition methods: Absorbing boundary conditions, in Absorbing Boundaries and Layers, Domain Decomposition Methods. Applications to Large Scale Computations, L. Tourrette and L. Halpern, eds., Nova Science Publishers Inc., New York, 2001, pp. 348–373.
Google Scholar
Khronos Group, The OpenCL Specification, 2010, Available at http://www.khronos.org (Accessed September 24, 2014).
Google Scholar
J. Krüger and R. Westermann, Linear algebra operators for GPU implementation of numerical algorithms, ACM Trans. Graph. 22 (2003), pp. 908–916. doi: 10.1145/882262.882363
Web of Science ®Google Scholar
J. Kruis, Domain Decomposition Methods for Distributed Computing, Saxe-Coburg Publications, Stirling, Scotland, 2007.
Google Scholar
R. Li and Y. Saad, GPU-accelerated preconditioned iterative linear solvers, J. Supercomput. 63(2) (2013), pp. 443–466.
Google Scholar
P.L. Lions, On the Schwarz alternating method. I, First International Symposium on Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, PA, 1988, pp. 1–42.
Google Scholar
P.L. Lions, On the Schwarz alternating method. II, in Domain Decomposition Methods, T.F. Chan, R. Glowinski, J. Périaux, and O. Widlund, eds., SIAM, Philadelphia, PA, 1989, pp. 47–70.
Google Scholar
P.L. Lions, On the Schwarz alternating method. III: A variant for nonoverlapping subdomains, Third International Symposium on Domain Decomposition Methods for Partial Differential Equations, Houston, TX, March 20–22, 1989, SIAM, Philadelphia, PA, 1990, pp. 202–223.
Google Scholar
Y. Maday and F. Magoulès, Non-overlapping additive Schwarz methods tuned to highly heterogeneous media, Comptes Rendus à l'Académie des Sci. 341 (2005), pp. 701–705.
Google Scholar
Y. Maday and F. Magoulès, Absorbing interface conditions for domain decomposition methods: A general presentation, Comput. Methods Appl. Mech. Eng. 195 (2006), pp. 3880–3900. doi: 10.1016/j.cma.2005.01.025
Web of Science ®Google Scholar
Y. Maday and F. Magoulès, Improved ad hoc interface conditions for Schwarz solution procedure tuned to highly heterogeneous media, Appl. Math. Model. 30 (2006), pp. 731–743. doi: 10.1016/j.apm.2005.05.020
Web of Science ®Google Scholar
Y. Maday and F. Magoulès, Optimized Schwarz methods without overlap for highly heterogeneous media, Comput. Methods Appl. Mech. Eng. 196 (2007), pp. 1541–1553. doi: 10.1016/j.cma.2005.05.059
Web of Science ®Google Scholar
F. Magoulés, A.-K. Cheik Ahamed, and R. Putanowicz, Auto-tuned Krylov methods on cluster of graphics processing unit, Int. J. Comput. Math. 92(6) (2015), pp. 1222–1250.
Google Scholar
F. Magoulès, P. Iványi, and B.H.V. Topping, Convergence analysis of Schwarz methods without overlap for the Helmholtz equation, Comput. Struct. 82 (2004), pp. 1835–1847. doi: 10.1016/j.compstruc.2004.02.025
Web of Science ®Google Scholar
F. Magoulès and F.-X. Roux, Lagrangian formulation of domain decomposition methods: A unified theory, Appl. Math. Model. 30 (2006), pp. 593–615. doi: 10.1016/j.apm.2005.06.016
Web of Science ®Google Scholar
F. Magoulès, F.-X. Roux, and L. Series, Algebraic way to derive absorbing boundary conditions for the Helmholtz equation, J. Comput. Acoust. 13 (2005), pp. 433–454. doi: 10.1142/S0218396X05002827
Web of Science ®Google Scholar
F. Magoulès, F.-X. Roux, and L. Series, Algebraic approximation of Dirichlet-to-Neumann maps for the equations of linear elasticity, Comput. Methods Appl. Mech. Eng. 195 (2006), pp. 3742–3759. doi: 10.1016/j.cma.2005.01.022
Web of Science ®Google Scholar
F. Magoulès, F.-X. Roux, and L. Series, Algebraic Dirichlet-to-Neumann mapping for linear elasticity problems with extreme contrasts in the coefficients, Appl. Math. Model. 30 (2006), pp. 702–713. doi: 10.1016/j.apm.2005.07.008
Web of Science ®Google Scholar
F. Magoulès, F.-X. Roux, and L. Series, Algebraic approach to absorbing boundary conditions for the Helmholtz equation, Int. J. Comput. Math. 84 (2007), pp. 231–240. doi: 10.1080/00207160601168605
Web of Science ®Google Scholar
K.K. Matam and K. Kothapalli, Accelerating sparse matrix Vector multiplication in iterative methods using GPU, Proceedings of the 2011 International Conference on Parallel Processing (ICPP '11), Washington, DC, USA, IEEE Computer Society, 2011, pp. 612–621.
Google Scholar
J. Meredith, D. Bremer, L. Flath, J. Johnson, H. Jones, S. Vaidya, and R. Frank, The GAIA project: Evaluation of GPU-based programming environments for knowledge discovery, Tech. rep., Lawrence Livermore National Labs, Livermore, 2004.
Google Scholar
F. Nataf, F. Rogier, and E. de Sturler, Optimal Interface Conditions for Domain Decomposition Methods, CMAP (Ecole Polytechnique) 301 (1994), pp. 1–18.
Google Scholar
S. Noury, S. Boivin, and O. L. Maître, A Fast Poisson Solver for OpenCL using Multigrid Methods, GPU Pro 2, W. Engel, Ed. A.K. Peters, 2011, pp. 445–471. ISBN 978-1-56881-718-7.
Google Scholar
Nvidia Corporation, CUDA toolkit 4.0, CUBLAS Library, 2011. Available at http://developer.nvidia.com/cuda-toolkit-40 (Accessed September 24, 2014).
Google Scholar
J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, GPU computing, Proc. IEEE 96 (2008), pp. 879–899. doi: 10.1109/JPROC.2008.917757
Web of Science ®Google Scholar
J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A.E. Lefohn, and T.J. Purcell, A survey of general-purpose computation on graphics hardware, EUROGRAPHICS (2005). Available at http://www.blackwell-synergy.com/doi/pdf/10.1111/j.1467-8659.2007.01012.x (Accessed March 7, 2015).
Google Scholar
A. Quarteroni and A. Valli, Domain Decomposition Methods for Partial Differential Equations, Oxford University Press, Oxford, 1999.
Google Scholar
S. Ristov, M. Gusev, L. Djinevski, and S. Arsenovski, Performance impact of reconfigurable L1 cache on GPU devices, Federated Conference on Computer Science and Information Systems (FedCSIS 2013), IEEE Conference Proceedings, Krakow, Poland, September 9–11, 2013, pp. 507–510.
Google Scholar
F.X. Roux, F. Magoulès, L. Series, and Y. Boubendir, Approximation of optimal interface boundary conditions for two-Lagrange multiplier FETI method, Proceedings of the 15th International Conference on Domain Decomposition Methods, Berlin, Germany, July 21–15, 2003, R. Kornhuber, R. Hoppe, J. Périaux, O. Pironneau, O. Widlund, and J. Xu, eds., Lecture Notes in Computational Science and Engineering, Springer-Verlag, Heidelberg, 2005, pp. 283–290.
Google Scholar
H. Schwarz, Über einen Grenzübergang durch alternierendes Verfahren, Vierteljahrsschrift der Naturforschenden Gesellschaft in Zürich 15 (1870), pp. 272–286.
Google Scholar
B. Smith, P. Bjorstad, and W. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, New York, NY, 1996.
Google Scholar
J. Sulaiman, M. Othman, and M.K. Hasan, Nine Point-EDGSOR Iterative method for the finite element solution of 2D Poisson equations, in International Conference Computational Science and Its Applications (ICCSA 2009), Seoul, Korea, June 29–July 2, Lecture Notes in Computer Science, Vol. 5592, O. Gervasi, D. Taniar, B. Murgante, A. Lagan, Y. Mun, and M. Gavrilova, eds., Springer, Berlin and Heidelberg, 2009, pp. 764–774.
Google Scholar
J.E. Stone, D. Gohara, and G. Shi, OpenCL ccc A Parallel Programming Standard for Heterogeneous Computing Systems, IEEE Des. Test 12(3) (2010), pp. 66–73.
Google Scholar
C.J. Thompson, S. Hahn, and M. Oskin, Using modern graphics architectures for general-purpose computing: a framework and analysis, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, MICRO 35, Istanbul, Turkey, IEEE Computer Society Press, Los Alamitos, CA, 2002, pp. 306–317.
Google Scholar
P. Tillet, K. Rupp, and S. Selberherr, An Automatic OpenCL Compute Kernel Generator for Basic Linear Algebra Operations, Proceedings of the 2012 Symposium on High Performance Computing, Orlando, FL, 2012, pp. 4:1–4:2.
Google Scholar
A. Toselli and O. Widlund, Domain Decomposition methods: Algorithms and Theory, Springer-Verlag, Berlin, 2005.
Google Scholar
L. Tourrette and L. Halpern, Absorbing Boundaries and Layers, Domain Decomposition Methods: Applications to Large Scale Computers, Nova Science Publishers, 2001. Available at http://books.google.fr/books?id=KrCsqv6WStwChttp://books.google.fr/books?id=KrCsqv6WStwC.
Google Scholar
A.H.E. Zein and A.P. Rendell, Generating optimal CUDA sparse matrix–vector product implementations for evolving GPU hardware, Concurrency Comput: Pract. Exp. 24 (2012), pp. 3–13. doi: 10.1002/cpe.1732
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Optimized Schwarz method without overlap for the gravitational potential equation on cluster of graphics processing unit

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Optimized Schwarz method without overlap for the gravitational potential equation on cluster of graphics processing unit

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date