References
- Abrahams , D. and Gurtovoy , A. 2004 . C++ Template Metaprogramming: Concepts, Tools and Techniques from Boost and Beyond , Reading, MA : Addison-Wesley .
- Adams , M.D. and Wise , D.S. 2006 . Fast additions on masked integers . SIGPLAN Not. , 41 ( 5 ) : 39 – 45 . http://doi.acm.org/10.1145/1149982.1149987
- Adams , M.D. and Wise , D.S. 2006 . “ Seven at one stroke: Results from a cache-oblivious paradigm for scalable matrix algorithms ” . In MSPC'06: Proceedings of the 2006 Workshop Memory System Performance and Correctness , 41 – 50 . New York : ACM Press . http://doi.acm.org/10.1145/1178597.1178604
- Balay , S. , Buschelman , K. , Eijkhout , V. , Gropp , W.D. , Kaushik , D. , Knepley , M.G. , McInnes , L.C. , Smith , B.F. and Zhang , H. 2004 . PETSc Users Manual , Technical Report ANL-95/11– Revision2.1.5 Argonne, IL : Argonne National Laboratory . http://www.mcs.anl.gov/petsc/petsc-as/
- Barton , J. and Nackman , L. 1994 . Scientific and Engineering C++ , Reading, MA : Addison-Wesley .
- Chatterjee , S. , Lebeck , A.R. , Patnala , P.K. and Thottenthodi , M. 1999 . “ Recursive array layouts and fast parallel matrix multiplication ” . In Proceedings of the 11th ACM Symposium on Parallel Algorithms and Architectures , 222 – 231 . New York : ACM Press . http://doi.acm.org/10.1145/305619.305645
- Chatterjee , S. , Lebeck , A.R. , Patnala , P.K. and Thottenthodi , M. 2002 . Recursive array layouts and fast parallel matrix multiplication . IEEE Trans. Parallel Distrib. Syst. , 13 : 1105 – 1123 . http://dx.doi.org/10.1109/TPDS.2002.1058095
- Dongarra , J.J. , Moler , C.B. , Bunch , J.R. and Stewart , G.W. 1979 . LINPACK Users' Guide , Philadelphia : Society for Industrial and Applied Mathematics .
- J.J. Dongarra, H.W. Meuer, E. Strohmaier, and H. Simon, Top 500 supercomputer sites, 2006, http://www.top500.org
- Fenics, The FEniCS projec home page, 2008, http://www.fenics.org/wiki/FEniCS_Project
- Frens , J.D. and Wise , D.S. 1997 . Auto-blocking matrix multiplication, or tracking BLAS3 performance from source code . 32 ( 7 ) : 206 – 216 . http://doi.acm.org/10.1145/263764.263789
- Frigo , M. , Leiserson , C.E. , Prokop , H. and Ramachandran , S. 1999 . “ Cache–oblivious algorithms ” . In Proceedings of 40th Annual Symposium on Foundations of Computer Science , 285 – 298 . Washington, DC : IEEE Computer Society Press . http://dx.doi.org/10.1109/SFFCS.1999.814600
- K. Goto, GotoBLAS, http://www.tacc.utexas.edu/resources/software/#blas
- Gottschling , P. 2006 . Fundamental Algebraic Concepts in Concept-Enabled C++ , Technical Report 638 Bloomington : Computer Science Department, Indiana University . http://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum = TR638
- Gottschling , P. and Lumsdaine , A. 2008 . “ Integrating semantics and compilation: Using C++ concepts to develop robust and efficient reusable libraries ” . In Proceedings of 7th International Conference on Generative Programming and Component Engineering , 67 – 76 . New York : ACM Press . http://doi.acm.org/10.1145/1449913.1449925
- Gottschling , P. , Wise , D.S. and Adams , M.D. 2007 . “ Representation-transparent matrix algorithms with scalable performance ” . In Proceedings of the 21st Annual International Conference on Supercomputing , 116 – 125 . New York : ACM Press . http://doi.acm.org/10.1145/1274971.1274989
- Gottschling , P. , Witkowski , T. and Voigt , A. 2008 . “ Integrating object-oriented and generic programming paradigms in real-world software environments: Experiences with AMDiS and MTL4 ” . In Seventh Workshop on Parallel/High-Performance Object-Oriented Programming (POOSC'08), Paphos, Cyprus URLhttp://www.c3.lanl.gov/poosc08/finalsubs/gottschling_poosc_amdis.pdf
- Lam , M.S. , Rothberg , E.E. and Wolf , M.E. 1991 . “ The cache performance and optimizations of blocked algorithms ” . In Proceedings of the 4th International Symposium Architectural Support for Programming Languages and Operating Systems , SIGPLAN Notices Vol. 26 , 63 – 74 . http://doi.acm.org/10.1145/106975.106981
- A. Lumsdaine, J. Siek, L.Q. Lee, and P. Gottschling, The Matrix Template Library home page, 2006, http://ww.osl.iu.edu/research/mtl
- Morton , G.M. 1966 . A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing , Technical Report Ottawa, Ont : IBM Ltd. .
- Musser , D.R. , Derge , G.J. and Saini , A. 2001 . STL Tutorial and Reference Guide , 2nd ed. , Boston, MA : Addison-Wesley .
- Park , N. , Hong , B. and Prasanna , V.K. 2003 . Tiling, block data layout and memory hierarchy performance . IEEE Trans. Parallel Distrib. Syst. , 14 : 640 – 654 . http://dx.doi.org/10.1109/TPDS.2003.1214317
- Plagne , L. and Hülsemann , F. 2008 . “ BTL++: From performance assessment to optimal libraries ” . In Computational Science – ICCS 2008 , Lecture Notes in Computer Science Edited by: Bubak , M. , Albadavan , G.D. and Dongarra , J. Vol. 5103 , 203 – 212 . Berlin http://dx.doi.org/10.1007/978-3-540-69389-5_24
- Raman , R. and Wise , D.S. 2008 . Converting to and from dilated integers . IEEE Trans. Comput. , 57 : 567 – 573 . http://dx.doi.org/10.1109/TC.2007.70814
- Schrack , G. 1992 . Finding Neighbors of Equal Size in Linear Quadtrees and Octrees in Constant Time , Vol. 55 , 221 – 230 . Image Underst : CVGIP . http://dx.doi.org/10.1016/1049-9660(92)90022-U
- Siek , J.G. and Lumsdaine , A. 1999 . The matrix template library: Generic components for high-performace scientific computing . Comput. Sci. Eng. , 1 : 70 – 78 . http://dx.doi.org/10.1109/5992.805137
- Spieß , J. 1976 . Untersuchungen des Zeitgewinns durch neue Algorithmen zur Matrix-Multiplikation . Computing , 17 : 23 – 36 . http://dx.doi.org/10.1007/BF02252257
- Stepanov , A. 1995 . The standard template library – how do you build an algorithm that is both generic and efficient? . Byte Mag. , 20 : 177 – 178 .
- Stroustrup , B. 1997 . The C++ Programming Language , 3rd ed. , Reading, MA : Addison-Wesley .
- Thiyagalingam , J. , Beckmann , O. and Kelly , P.H.J. 2006 . Is Morton layout competitive for large two-dimensional arrays, yet? . Concur. Comput. Prac. Exper. , 18 : 1509 – 1539 . http://dx.doi.org/10.1002/cpe.1018
- Wise , D.S. 2000 . “ Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free ” . In Euro-Par 2000 – Parallel Processing , Lecture Notes in Computer Science Edited by: Bode , A. , Ludwig , T. , Karl , W. and Wismüller , R. Vol. 1900 , 774 – 883 . Heidelberg : Springer . http://dx.doi.org/10.1007/3-540-44520-X_108
- Wise , D.S. , Citro , C.L. , Hursey , J.J. , Liu , F. and Rainey , M.A. 2005 . “ A paradigm for parallel matrix algorithms: Scalable Cholesky ” . In Euro-Par 2005 – Parallel Processing , Lecture Notes in Computer Science Edited by: Cunha , J.C. and Medeiros , P.D. Vol. 3648 , 687 – 698 . Berlin : Springer . http://dx.doi.org/10.1007/11549468_76