93
Views
18
CrossRef citations to date
0
Altmetric
Article

Simple memory machine models for GPUs

Pages 17-37 | Received 02 May 2012, Accepted 12 Sep 2012, Published online: 27 Nov 2012

References

  • A.V.Aho, J.D.Ullman, and J.E.Hopcroft, Data Structures and Algorithms, Addison Wesley, Boston, 1983.
  • S.G.Akl, Parallel Sorting Algorithms, Academic Press, Orlando, FL, 1985.
  • K.E.Batcher, Sorting networks and their applications, in Proceedings of the AFIPS Spring Joint Computer Conference, American Federation of Information Processing Societies, Vol. 32, 1968, pp. 307–314.
  • R.H.Bisseling, Parallel Scientific Computation: A Structured Approach Using BSP and MPI, Oxford University Press, Oxford, 2004.
  • D.Culler, R.Karp, D.Patterson, A.Sahay, K.E.Schauser, E.Santos, R.Subramonian, and T.Eickenvon, LogP: Towards a realistic model of parallel computation, in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, 1993, pp. 1–12.
  • M.J.Flynn, Some computer organizations and their effectiveness, IEEE Trans. Comput.C-21 (1972), pp. 948–960.
  • A.Gibbons and W.Rytter, Efficient Parallel Algorithms, Cambridge University Press, Cambridge, 1988.
  • A.Gottlieb, R.Grishman, C.P.Kruskal, K.P.McAuliffe, L.Rudolph, and M.Snir, The nyu ultracomputer – designing an MIMD shared memory parallel computer, IEEE Trans. Comput.C-32 (1983), pp. 175–189.
  • N.K.Govindaraju, S.Larsen, J.Gray, and D.Manocha, A memory model for scientific algorithms on graphics processors, in Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, 2006, p. 89.
  • A.Grama, G.Karypis, V.Kumar, and A.Gupta, Introduction to Parallel Computing, Addison Wesley, Boston, 2003.
  • W.W.Hwu, GPU Computing Gems Emerald Edition, Morgan Kaufmann, MA, 2011.
  • Y.Ito, K.Ogawa, and K.Nakano, Fast ellipse detection algorithm using Hough transform on the GPU, in Proceedings of International Conference on Networking and Computing, IEEE Computer Society, December, 2011 pp. 313–319.
  • D.H.Lawrie, Access and alignment of data in an array processor, IEEE Trans. Comput.C-24 (1975), pp. 1145–1155.
  • F.T.Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, MA, 1991.
  • D.Man, K.Uda, Y.Ito, and K.Nakano, A GPU implementation of computing euclidean distance map with efficient memory access, in Proceedings of International Conference on Networking and Computing, IEEE Computer Society, December, 2011, pp. 68–76.
  • D.Man, K.Uda, H.Ueyama, Y.Ito, and K.Nakano, Implementations of a parallel algorithm for computing euclidean distance map in multicore processors and GPUs, Int. J. Netw. Comput.1 (2011), pp. 260–276.
  • K.Nakano, Optimal sorting algorithms on bus-connected processor arrays, IEICE Trans. Fundam.E76-A (1993), pp. 2008–2015.
  • K.Nishida, Y.Ito, and K.Nakano, Accelerating the dynamic programming for the matrix chain product on the GPU, in Proceedings of International Conference on Networking and Computing, IEEE Computer Society, December, 2011, pp. 320–326.
  • K.Nishida, Y.Ito, and K.Nakano, Accelerating the dynamic programming for the optial poygon triangulation on the GPU, in Proceedings of International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP, LNCS 7439), Springer, September, 2012 pp. 1–15.
  • NVIDIA Corporation, NVIDIA CUDA C best practice guide version 3.1 (document can be downloaded from http://developer.nvidia.com/cuda/nvidia-gpu-computing-documentation), 2010.
  • NVIDIA Corporation, NVIDIA CUDA C programming guide version 4.0 (document can be downloaded from http://developer.nvidia.com/cuda/nvidia-gpu-computing-documentation), 2011.
  • M.J.Quinn, Parallel Computing: Theory and Practice, McGraw-Hill, New York, 1994.
  • G. Ruetsch and P. Micikevicius, Optimizing matrix transpose in CUDA, NVIDIA technical report2009.
  • S.Ryoo, C.I.Rodrigues, S.S.Baghsorkhi, S.S.Stone, D.B.Kirk, and W.W.Hwumei, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, 2008, pp. 73–82.
  • A.Uchida, Y.Ito, and K.Nakano, Fast and accurate template matching using pixel rearrangement on the GPU, in Proceedings of International Conference on Networking and Computing, IEEE Computer Society, December, 2011, pp. 153–159.
  • R.Vaidyanathan and J.L.Trahan, Dynamic Reconfiguration: Architectures and Algorithms, Kluwer Academic/Plenum Publishers, New York, 2004.
  • D.T. Wang, Modern dram memory systems: Performance analysis and a high performance, power-constrained DRAM scheduling algorithm, Ph.D. thesis, University of Maryland, USA, 2005.
  • R.J.Wilson, Introduction to Graph Theory, 3rd ed., Longman, Harlow, Essex, 1985.
  • Xilinx Inc, Virtex-5 FPGA users guide (document can be downloaded from http://www.xilinx.com/support/), 2009.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.