46
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Supernode transformation on GPGPUs

&
Pages 181-202 | Received 18 Aug 2016, Accepted 12 Feb 2017, Published online: 16 Mar 2017

References

  • Shang W , Fortes J . Time optimal linear schedules for algorithms with uniform dependencies. IEEE Trans Comput. 1991 Jun;40(6):723–742.10.1109/12.90251
  • Hirschberg D . A linear space algorithm for computing maximal common subsequences. Commun ACM. 1975 Jun;18(6):341–343.10.1145/360825.360861
  • Irigoin F , Triolet R . Supernode partitioning. Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages; San Diego, CA; 1988. p. 319–329.
  • Sinharoy B , Szymanski B . Finding optimum wavefront of parallel computation. J Parallel Algor Appl. 1994;2(1–2):5–26.10.1080/10637199408915404
  • Hodzic E , Shang W . On supernode transformation with minimized total running time. IEEE Trans Parallel Distrib Syst. 1998 May;9(5):417–428.10.1109/71.679213
  • Hodzic E , Shang W . On time optimal supernode shape. IEEE Trans Parallel Distrib Syst. 2002 Dec;1220–1233.10.1109/TPDS.2002.1158261
  • Goumas G , Sotiropoulos A , Koziris N . Minimizing completion time for loop tiling with computation and communication overlapping. Proceedings of IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS’01); 2001 April; San Francisco, CA; 2001.
  • Athanasaki M , Sotiropoulos A , Tsoukalas G , et al. Pipelined scheduling of tiled nested loops onto clusters of SMPs using memory mapped network interfaces. Proceedings of the 2002 ACM/IEEE conference on Supercomputing (SC2002); 2002 Nov; Baltimore, MD; 2002.
  • Cohen A , Girbal S , Parello D , et al . Facilitating the search for compositions of program transformations. ACM ICS 2005: Proceeding of the 19th Annual International Conference on Supercomputing; New York, NY; 2005. p. 151–160.
  • Feautrier P . Some efficient solutions to the affine scheduling problem. I. One-dimensional time. Int J Parallel Prog. 1992;21(5):313–347.10.1007/BF01407835
  • Girbal S , Vasilache N , Bastoul C , et al . Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int J Parallel Prog. 2006;34(3):261–317.10.1007/s10766-006-0012-3
  • Lim A , Liao S , Lam M . Blocking and array contraction across arbitrarily nested loops using affine partitioning. Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming; Snowbird, UT; 2001. p. 103–112.
  • Lim A , Cheong G , Lam M . An affine partitioning algorithm to maximize parallelism and minimize communication. Proceedings of the 13th International Conference on Supercomputing; Rhodes; 1999. p. 228–237.
  • Ahmed N , Mateev N , Pingali K . Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. Int J Parallel Prog. 2001 Oct;29(5):493–544.
  • Bondhugula U , Hartono A , Ramanujam J , et al . A practical automatic polyhedral parallelizer and locality optimizer. PLDI 2008 Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation; 2008 Jun; Tuscon, AZ; 2008.
  • Ohta H , Saito Y , Kainaga M . Optimal tile size adjustment in compiling general DOACROSS loop nests. In: ACM press , editor. ICS ‘95 proceedings of the 9th international conference on supercomputing. Barcelona: ACM Press; 1995. p. 270–279.
  • Calland PY , Dongarra J , Robert Y . Tiling with limited resources. Proceedings Conference Application Specific Systems, Architectures, and Processors. Zurich: IEEE Computer Society; 1997. p. 229–238.
  • Boulet P , Dongarra J , Robert Y , et al . Tiling for heterogeneous computing platforms. Knoxville, TN : University of Tennessee; 1997 ( Technical Report UT-CS-97-373).
  • Athanasaki M , Koukis E , Koziris N . Scheduling of tiled iteration spaces onto a cluster with a fixed number of SMP nodes. Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing. Coruna: IEEE; 2004.
  • Cormen T , Leiserson C , Rivest R , et al . Introduction to algorithms. Cambridge, MA: MIT Press; 2001.
  • Ukiyama N , Imai H . Parallel multiple alignments and their implementation on CM5. Genome Inform; Yokohama. 1993 Dec;4:103–108.
  • Yang J , Xu Y , Shang Y . An efficient parallel algorithm for longest common subsequence problem on GPUs. WCE 2010 – Proceedings of the World Congress on Engineering; London; 2010. p. 499–504.
  • Jeffrey A . Complex analysis and applications. 2nd ed. Boca Raton, Fl: Chapman and Hall/CRC; 2005 Nov. p. 22–23.
  • Nvidia CUDA Programming Guide 2.3. Santa Clara, CA: Nvidia Corporation; 2009.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.