423
Views
81
CrossRef citations to date
0
Altmetric
Original Articles

Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations

, &
Pages 221-256 | Received 01 Dec 2006, Accepted 01 Oct 2006, Published online: 06 Apr 2009

References

  • Wilkes , M. 2000 . “ The memory gap (keynote) ” . In Solving the Memory Wall Problem Workshop http://www.ece.neu.edu/conf/wall2k/wilkes1.pdf
  • Ho , C.H. , Leong , P. , Luk , W. , Wilton , S. and Lopez-Buedo , S. 2006 . “ Virtual embedded blocks: a methodology for evaluating embedded elements in FPGAs ” . In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'06)
  • Dekker , T.J. 1971 . A floating-point technique for extending the available precision . Numerische Mathematik , 18 : 224 – 242 .
  • Knuth , D.E. 1997 . The Art of Computer Programming, Volume 2 (3rd ed.): Seminumerical Algorithms , Boston, MA : Addison-Wesley Longman Publishing Co., Inc. .
  • Møller , O. 1965 . Quasi double-precision in floating point addition . BIT , 5 ( 1 ) : 37 – 50 .
  • Hida , Y. , Li , X.S. and Bailey , D.H. 2001 . “ Algorithms for quad-double precision floating point arithmetic ” . In Proceedings of the 15th Symposium on Computer Arithmetic Edited by: Burgess , N. and Ciminiera , L. 155 – 162 .
  • Li , X.S. , Demmel , J.W. , Bailey , D.H. , Henry , G. , Hida , Y. , Iskandar , J. , Kahan , W. , Kang , S.Y. , Kapur , A. , Martin , M.C. , Thompson , B.J. , Tung , T. and Yoo , D.J. 2002 . Design, implementation and testing of extended and mixed precision BLAS . ACM Transactions on Mathematical Software , 28 ( 2 ) : 152 – 205 .
  • Priest , D.M. 1991 . “ Algorithms for arbitrary precision floating point arithmetic ” . In 10th IEEE Symposium on Computer Arithmetic 132 – 143 .
  • Shewchuk , J.R. 1997 . Adaptive precision floating-point arithmetic and fast robust geometric predicates . Discrete & Computational Geometry , 18 ( 3 ) : 305 – 363 . October
  • Bailey, D.H., Hida, Y., Jeyabalan, K., Li, X.S. and Thompson, B., 2006, High-precision software directory, http://crd.lbl.gov/∼dhbailey/mpdist/ (http://crd.lbl.gov/~dhbailey/mpdist/)
  • Free Software Foundation, Inc., GNU Multiple Precision Arithmetic Library, 4.2.1 edition, 2006. http://www.swox.com/gmp .
  • Wilkinson , J.H. 1963 . Rounding Errors in Algebraic Processes , New York, NY : Dover Publications, Incorporated .
  • Martin , R.S. , Peters , G. and Wilkinson , J.H. 1966 . Handbook series linear algebra: iterative refinement of the solution of a positive definite system of equations . Numerische Mathematik , 8 : 203 – 216 .
  • Bowdler , H.J. , Martin , R.S. , Peters , G. and Wilkinson , J.H. 1966 . Handbook series linear algebra: solution of real and complex systems of linear equations . Numerische Mathematik , 8 : 217 – 234 .
  • Demmel , J. , Hida , Y. , Kahan , W. , Li , X.S. , Mukherjeek , S. and Riedy , E.J. 2006 . Error bounds from extra precise iterative refinement . ACM Transactions on Mathematical Software , 32 ( 2 ) : 325 – 351 . June
  • Zielke , G. and Drygalla , V. 2003 . Genaue Lösung linearer Gleichungssysteme . GAMM-Mitteilungen , 2 ( 1 ) : 7 – 107 .
  • Turner , K. and Walker , H.F. 1992 . Efficient high accuracy solutions with GMRES(m) . SIAM Journal on Scientific and Statistical Computing archive , 13 ( 3 ) : 815 – 825 .
  • Geddes , K.O. and Zheng , W.W. 2003 . “ Exploiting fast hardware floating point in high precision computation ” . In ISSAC'03: Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation , 111 – 118 . New York, NY : ACM Press .
  • Langou , J. , Langou , J. , Luszczek , P. , Kurzak , J. , Buttari , A. and Dongarra , J.J. 2006 . “ Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems) ” . In Proceedings of the ACM/IEEE SuperComputing 2006 (SC'06) to appear
  • Stewart , G.W. 1973 . Introduction to Matrix Computations , San Diego : Academic Press .
  • Higham , N.J. 2002 . Accuracy and Stability of Numerical Algorithms , 2nd ed. , Philadelphia, PA : Society for Industrial and Applied Mathematics .
  • Hartenstein , R. 2001 . A decade of reconfigurable computing: a visionary retrospective . Design, Automation and Test in Europe—DATE , : 2001 March
  • Hartenstein , R. 2003 . “ Data-stream-based computing: models and architectural resources ” . In International Conference on Microelectronics, Devices and Materials (MIDEM 2003) Ocotober
  • Sankaralingam , K. , Nagarajan , R. , Liu , H. , Kim , C. , Huh , J. , Burger , D. , Keckler , S.W. and Moore , C.R. 2003 . “ Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture ” . In ISCA 2003 422 – 433 .
  • Guo , Z. , Najjar , W. , Vahid , F. and Vissers , K. 2004 . “ A quantitative analysis of the speedup factors of FPGAs over processors ” . In ACM/IEEE International Symposium on Field-Programmable Gate Arrays
  • Taylor , M.B. , Kim , J.S. , Miller , J. , Wentzlaff , D. , Ghodrat , F. , Greenwald , B. , Hoffman , H. , Johnson , P. , Lee , J. , Lee , W. , Ma , A. , Saraf , A. , Seneski , M. , Shnidman , N. , Strumpen , V. , Frank , M. , Amarasinghe , S.P. and Agarwal , A. 2002 . The raw microprocessor: a computational fabric for software circuits and general purpose programs . IEEE Micro , 22 ( 2 ) : 25 – 35 .
  • Suh , J. , Kim , E. , Crago , S.P. , Srinivasan , L. and French , M.C. 2003 . “ A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels ” . In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA-03), Volume 31 of Computer Architecture News , Edited by: DeGroot , D. 410 – 421 . New York, NY : ACM Press . June
  • Strzodka, R., 2004, Hardware Efficient PDE Solvers in Quantized Image Processing. PhD thesis, University of Duisburg-Essen.
  • Fatahalian , K. , Knight , T.J. , Houston , M. , Erez , M. , Horn , D.R. , Leem , L. , Park , J.Y. , Ren , M. , Aiken , A. , Dally , W.J. and Hanrahan , P. 2006 . “ Sequoia: programming the memory hierarchy ” . In Proceedings of the ACM/IEEE SuperComputing 2006 (SC'06) to appear
  • Clearspeed, CSX600. www.clearspeed.com/downloads/CSX600Processor.pdf , 2006.
  • IBM Sony, Toshiba. Cell BE. http://www.ibm.com/developerworks/power/cell .
  • Mercury. Cell BE. http://www.mc.com/cell/ .
  • Williams , S. , Shalf , J. , Oliker , L. , Kamil , S. , Husbands , P. and Yelick , K. 2006 . “ The potential of the cell processor for scientific computing ” . In CF '06: Proceedings of the 3rd Conference on Computing Frontiers , 9 – 20 . New York, NY : ACM Press .
  • AGEIA. PhysX. http://www.ageia.com/products/physx.html .
  • Göddeke , D. and Strzodka , R. Scientific computing on graphics hardware, tutorial at the 6th International Conference on Computational Science (ICCS 2006)
  • Owens , J.D. , Luebke , D. , Govindaraju , N. , Harris , M.J. , Krüger , J. , Lefohn , A.E. and Purcell , T. 2005 . “ A survey of general-purpose computation on graphics hardware ” . In Eurographics 2005, State of the Art Reports 21 – 51 .
  • GPGPU. General-purpose computation using graphics hardware, http://www.gpgpu.org .
  • Strzodka , R. , Doggett , M. and Kolb , A. 2005 . Scientific computation for simulations on programmable graphics hardware . Simulation Modelling Practice and Theory, Special Issue: Programmable Graphics Hardware , 13 ( 8 ) : 667 – 680 .
  • Hillesland , K. and Lastra , A. 2004 . “ GPU floating-point paranoia ” . In Proceedings of GP2
  • Daumas , M. , Da Graça , G. and Defour , D. 2006 . “ Caractéristiques arithmétiques des processeurs graphiques ” . In Symposium en Architecture de Machines
  • Da Graça , G. and Defour , D. 2006 . “ Implementation of float–float operators on graphics hardware ” . In 7th Conference on Real Numbers and Computers, RNC7 23 – 32 .
  • Hitz , M.A. and Payne , B.R. 2006 . “ Implementation of residue number systems on GPUs ” . In ACM SIGGRAPH Conference Abstracts and Applications
  • Thall , A. 2006 . “ Extended-precision floating-point numbers for GPU computation ” . In ACM SIGGRAPH Conference Abstracts and Applications
  • Dale , K. , Sheaffer , J.W. , Kumar , V.V. , Luebke , D.P. , Humphreys , G. and Skadron , K. 2006 . “ Applications of small-scale reconfigurability to graphics processors ” . In Proceedings of the International Workshop on Applied Reconfigurable Computing (ARC2006) , Berlin : Springer .
  • Belanovic , P. and Leeser , M. 2002 . “ A library of parameterized floating-point modules and their use ” . In FPL'02: Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications , 657 – 666 . London : Springer-Verlag .
  • Fang , F. , Chen , T. and Rutenbar , R. 2002 . Lightweight floating-point arithmetic: case study of inverse discrete cosine transform . EURASIP Journal on Signal Processing, Special Issue on Applied Implementation of DSP and Communication Systems , : 879 – 892 .
  • Gaffar , A.A. , Mencer , O. , Luk , W. and Cheung , P.Y.K. 2004 . “ Unifying bit-width optimisation for fixed-point and floating-point designs ” . In 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM04) 79 – 88 .
  • Liang , J. , Tessier , R. and Mencer , O. 2003 . “ Floating point unit generation and evaluation for FPGAs ” . In FCCM'03: Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines , 185 Washington, DC : IEEE Computer Society .
  • Dido , J. , Geraudie , N. , Loiseau , L. , Payeur , O. , Savaria , Y. and Poirier , D. 2002 . “ A flexible floating-point format for optimizing data-paths and operators in FPGA based DSPs ” . In FPGA'02: Proceedings of the 2002 ACM/SIGDA 10th International Symposium on Field Programmable Gate Arrays , 50 – 55 . New York, NY : ACM Press .
  • Govindu , G. , Zhuo , L. , Choi , S. and Prasanna , V. 2004 . “ Analysis of high-performance floating-point arithmetic on FPGAs ” . In 18th International Parallel and Distributed Processing Symposium (IPDPS04), Workshop 3 149b
  • Matousek , R. , Tichy , M. , Phol , Z. , Kadlec , J. , Softley , C. and Coleman , N. 2002 . “ Logarithmic number systems and floating-point arithmetics on FPGA ” . In 12th International Conference on Field Programmable Logic and Applications , 627 – 636 . London : Springer-Verlag .
  • Haselman , M. , Beauchamp , M. , Wood , A. , Hauck , S. , Underwood , K. and Hemmert , K.S. 2005 . “ A comparison of floating point and logarithmic number systems on FPGAs ” . In 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05) 181 – 190 .
  • Bondalapati , K. and Prasanna , V.K. 2002 . Reconfigurable computing systems . Proceedings of the IEEE ,
  • Compton , K. and Hauck , S. 2002 . Reconfigurable computing: a survey of systems and software . ACM Computing Surveys , 34 ( 2 ) : 171 – 210 .
  • Turek , S. 1999 . Efficient Solvers for Incompressible Flow Problems: An Algorithmic and Computational Approach , Berlin : Springer .
  • Grajewski, M., Köster, M., Kilian, S. and Turek, S., 2005, Numerical analysis and practical aspects of a robust and efficient grid deformation method in the finite element context, Ergebnisberichte des Instituts für Angewandte Mathematik, Nr. 294, FB Mathematik, Universität Dortmund.
  • Industrial Light & Magic, OpenEXR, implementation of the half data type.
  • Strzodka , R. and Göddeke , D. 2006 . “ Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components ” . In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006)
  • Göddeke , D. , Strzodka , R. and Turek , S. 2005 . “ Accelerating double precision FEM simulations with GPUs ” . In 18th Symposium Simulations Technique , Edited by: Hülsemann , F. , Kowarschik , M. and Rüde , U. 139 – 144 . Erlangen : SCS Publishing House e.V. . volume Frontiers in Simulation, ASIM 2005
  • Göddeke , D. , Becker , Ch. and Turek , S. 2006 . “ Integrating GPUs as fast co-processors into the parallel FE package FEAST ” . In Proceedings of the 19th Symposium on Simulation Technique Edited by: Becker , M. and Szczerbicka , H. 277 – 282 .
  • Altieri , M. , Becker , Ch. and Turek , S. 1999 . “ On the realistic performance of linear algebra components in iterative solvers ” . In High Performance Scientific and Engineering Computing: Proceedings of the International FORTWIHR Conference on HPSEC, volume 8 of Lecture Notes in Computational Science and Engineering , Edited by: Bungartz , H.-J. , Durst , F. and Zenger , Chr. 3 – 12 . Berlin : Springer .
  • Kilian, S., 2001, Ein verallgemeinertes Gebietszerlegungs-/Mehrgitterkonzept auf Parallelrechnern. PhD thesis, Universität Dortmund.
  • Becker , Ch. , Kilian , S. and Turek , S. 2002 . “ Hardware-oriented numerics and concepts for PDE software ” . In FUTURE 1095 , 1 – 23 . Amsterdam : Elsevier . International Conference on Computational Science ICCS2002
  • Strzodka , R. and Göddeke , D. 2006 . “ Mixed precision methods for convergent iterative schemes ” . In Proceedings of the Workshop on Edge Computing Using New Commodity Architectures

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.