References
- Dieter an Mey . 2003 . “ Two OpenMP programming patterns ” . In Proceedings of the Fifth European Workshop on OpenMP (EWOMP2005)
- Barrett , R. , Berry , M. , Chan , T.F. , Demmel , J. , Donato , J. , Dongarra , J. , Eijkhout , V. , Pozo , R. , Romine , C. and van der Vorst , H. 1994 . Templates for the solution of linear systems: building blocks for iterative methods . SIAM ,
- Bircsak , J. , Craig , P. , Crowell , R. , Cvetanovic , Z. , Harris , J. , Alexander , N.C. and Offner , C.D. 2000 . Extending OpenMP for NUMA machines . Scientific Programming , 8 : 163 – 181 .
- Brehm , J. and Jordan , H.F. 1989 . “ Parallelizing algorithms for mimd architectures with shared memory ” . In Proceedings of the 3rd International Conference on Supercomputing , 244 – 253 . ACM Press .
- Mark Bull , J. and Johnson , C. 2002 . “ Data distribution, migration and replication on a cc-NUMA architecture ” . In Proceedings of the Fourth European Workshop on OpenMP http://www.caspur.it/ewomp2002/
- Burgess, D.A. and Giles, M.B., Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines, Technical report, Oxford University Computing Laboratory, Numerical Analysis Group, May 1995.
- Charlesworth , A. 1998 . Starfire: extending the SMP envelope . IEEE Micro , 18 ( 1 ) : 39 – 49 .
- Charlesworth , A. 2001 . “ The sun fireplane system interconnect ” . In Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (CDROM) , 7 ACM Press .
- Chronopoulos , A.T. and Gear , C.W. 1989 . S-step iterative methods for symmetric linear systems . Journal of Computational and Applied Mathematics , 25 : 153 – 168 .
- Cuthill , E. and McKee , J. 1969 . “ Reducing the bandwidth of sparse symmetric matrices ” . In Proceedings of the 1969 24th National Conference , 157 – 172 . ACM Press .
- Dongarra, J. and Eijkhout, V., Finite-choice algorithm optimization in conjugate gradients, Technical Report UT-CS-03-502, Lapack Working Note 159. University of Tennessee Computer Science Report, January 2003.
- Dongarra , J. , Foster , I. , Fox , G. , Gropp , W. , Kennedy , K. , Torczon , L. and White , A. 2003 . Sourcebook of Parallel Computing , Morgan Kaufmann .
- Dongarra , J.J. , Duff , I.S. , Sorensen , D.C. and van der Vorst , H.A. 1998 . Numerical linear algebra for high performance computers . SIAM ,
- Edelvik , F. Hybrid solvers for the Maxwell equations in time-domain . Doctoral thesis . Mathematics and Computer Science, Department of Information Technology, University of Uppsala .
- Gibbs , N.E. Jr. , Poole , W.G. and Stockmeyer , P.K. 1976 . An algorithm for reducing the bandwith and profile of a sparse matrix . SIAM Journal on Numerical Analysis , 13 ( 2 ) : 236 – 250 .
- Golub , Gene and O'Leary , D. 1989 . Some history of the conjugate gradient and Lanczos methods . SIAM Review , 31 : 50 – 102 .
- Haveraaen , M. and Hundvebakke , H. 2001 . “ Some statistical performance estimation techniques for dynamic machines ” . In Norsk Informatikkonferanse (NIK 2001) http://www.nik.no/2001/17-haveraaen.pdf
- Henrik Löf , S.H. and Norden , M. 2004 . “ Improving geographical locality of data for shared memory implementations of PDE solvers ” . In Computational Science—ICCS 2004: 4th International Conference, Kraków, Poland, June 6–9, 2004, Proceedings, Part II 9 – 16 . http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&vo%lume=3037&spage=9
- Karypsis , G. and Kumar , V. 1999 . A fast and highly quality multilevel scheme for partitioning irregular graphs . SIAM Journal on Scientific Computing , 20 ( 1 ) : 359 – 392 .
- Laudon , J. and Lenoski , D. 1997 . “ The SGI origin: a ccNUMA highly scalable server ” . In Proceedings of the 24th Annual International Symposium on Computer Architecture , 241 – 251 . ACM Press .
- Löf , H. and Holmgren , S. 2005 . “ Affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system ” . In Proceedings of the 19th ACM International Conference on Supercomputing , 387 – 392 . ACM Press .
- Nikolopoulos , D.S. and Papatheodorou , T.S. 2000 . A transparent runtime data distribution engine for OpenMP . Scientific Programming , 8 : 143 – 162 .
- Oliker , L. , Li , X. , Husbands , P. and Biswas , R. 2002 . Effects of ordering strategies and programming paradigms on sparse matrix computations . SIAM Review , 44 ( 3 ) : 373 – 393 .
- Pinar , A. and Heath , M.T. 1999 . “ Improving performance of sparse matrix-vector multiplication ” . In Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (CDROM) , 30 ACM Press .
- Sun Microsystems, Solaris Memory Placement Optimization and Sun Fire servers, January 2003. http://www.sun.com/servers/wp/docs/mpo_v7_CUSTOMER.pdf .
- Sun Microsystems, 2003, UltraSPARC III Cu user's manual, http://www.sun.com/processors/manuals
- Toledo , S. 1997 . Improving the memory-system performance of sparse-matrix vector multiplication . IBM Journal of Research and Development , 41 ( 6 ) : 711 – 725 .
- van der Vorst , H.A. 2003 . Iterative Krylov methods for large linear systems , Number 13 in Cambridge monographs on applied and computational mathematics Cambridge University Press .
- Vuduc , R. , Demmel , J.W. , Yelick , K.A. , Kamil , S. , Nishtala , R. and Lee , B. 2002 . “ Performance optimizations and bounds for sparse matrix-vector multiply ” . In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing , 1 – 35 . IEEE Computer Society Press .