Search in:

International Journal of Parallel, Emergent and Distributed Systems Volume 24, 2009 - Issue 3

Submit an article Journal homepage

352

Views

CrossRef citations to date

Altmetric

Original Articles

Concurrent number cruncher: a GPU implementation of a general sparse linear solver

Luc Buatois ENSG/CRPG, Gocad Research Group, Nancy University, Rue du Doyen Roubault, BP40 54501, Vandoeuvre-les-Nancy, France; INRIA Lorraine, ALICE, BP 239 – 54506, Vandoeuvre-les-Nancy Cedex, FranceCorrespondence[email protected]

Guillaume Caumon ENSG/CRPG, Gocad Research Group, Nancy University, Rue du Doyen Roubault, BP40 54501, Vandoeuvre-les-Nancy, France

Bruno Lévy INRIA Lorraine, ALICE, BP 239 – 54506, Vandoeuvre-les-Nancy Cedex, France

Pages 205-223 | Received 10 Sep 2007, Accepted 27 Jun 2008, Published online: 02 Jun 2009

Cite this article
https://doi.org/10.1080/17445760802337010

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

Citations (82)

Keep up to date with the latest research on this topic with citation updates for this article.

Subscribe to citation updates

Read on this site (1)

J.L. Ansoni, A.C. Brandi & P. Seleghim$suffix/text()$suffix/text(). (2015) Resolution of an inverse thermal problem using parallel processing on shared-memory multiprocessor architectures. Inverse Problems in Science and Engineering 23:2, pages 351-375.
Read now

Articles from other publishers (81)

Elmira Karimi, Nicolas Bohm Agostini, Shi Dong & David Kaeli. (2022) VCSR: An Efficient GPU Memory-Aware Sparse Format. IEEE Transactions on Parallel and Distributed Systems 33:12, pages 3977-3989.
Crossref

Weicai Ye, Chenghuan Huang, Jiasheng Huang, Jiajun Li, Yao Lu & Ying Jiang. (2022) An Integral-equation-oriented Vectorized SpMV Algorithm and its Application on CT Imaging Reconstruction. An Integral-equation-oriented Vectorized SpMV Algorithm and its Application on CT Imaging Reconstruction.

Huanyu Cui, Nianbin Wang, Yuhua Wang, Qilong Han & Yuezhu Xu. (2021) An effective SPMV based on block strategy and hybrid compression on GPU. The Journal of Supercomputing 78:5, pages 6318-6339.
Crossref

Venkata Dinavahi & Ning LinVenkata Dinavahi & Ning Lin. 2022. Parallel Dynamic and Transient Simulation of Large-Scale Power Systems. Parallel Dynamic and Transient Simulation of Large-Scale Power Systems 71 133 .

Ruipeng Li, Björn Sjögreen & Ulrike Meier Yang. (2021) A New Class of AMG Interpolation Methods Based on Matrix-Matrix Multiplications. SIAM Journal on Scientific Computing 43:5, pages S540-S564.
Crossref

Guoqing Xiao, Kenli Li, Yuedan Chen, Wangquan He, Albert Y. Zomaya & Tao Li. (2021) CASpMV: A Customized and Accelerative SpMV Framework for the Sunway TaihuLight. IEEE Transactions on Parallel and Distributed Systems 32:1, pages 131-146.
Crossref

Jun Ma, Jun Tao, Chaoli Wang, Can Li, Ching-Kuang Shene & Seung Hyun Kim. (2019) Moving with the flow: an automatic tour of unsteady flow fields. Journal of Visualization 22:6, pages 1125-1144.
Crossref

Martin Komaritzan & Mario Botsch. (2019) Fast Projective Skinning. Fast Projective Skinning.

Changxi Liu, Biwei Xie, Xin Liu, Wei Xue, Hailong Yang & Xu Liu. (2018) Towards Efficient SpMV on Sunway Manycore Architectures. Towards Efficient SpMV on Sunway Manycore Architectures.

Edoardo Coronado-Barrientos, Guillermo Indalecio & Antonio García-Loureiro. (2018) Improving performance of iterative solvers with the AXC format using the Intel Xeon Phi. The Journal of Supercomputing 74:6, pages 2823-2840.
Crossref

Salvatore Filippone, Valeria Cardellini, Davide Barbieri & Alessandro Fanfarillo. (2017) Sparse Matrix-Vector Multiplication on GPGPUs. ACM Transactions on Mathematical Software 43:4, pages 1-49.
Crossref

Mayez A. Al-Mouhamed & Ayaz H. Khan. (2017) SpMV and BiCG-Stab optimization for a class of hepta-diagonal-sparse matrices on GPU. The Journal of Supercomputing 73:9, pages 3761-3795.
Crossref

Wangdong Yang, Kenli Li & Keqin Li. (2017) A hybrid computing method of SpMV on CPU–GPU heterogeneous computing systems. Journal of Parallel and Distributed Computing 104, pages 49-60.
Crossref

Jiaquan Gao, Yuanshen Zhou, Guixia He & Yifei Xia. (2017) A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm. Parallel Computing 63, pages 1-16.
Crossref

Jiaquan Gao, Yu Wang, Jun Wang & Ronghua Liang. (2016) Adaptive Optimization Modeling of Preconditioned Conjugate Gradient on Multi-GPUs. ACM Transactions on Parallel Computing 3:3, pages 1-33.
Crossref

Zhen Xu, Xinzheng Lu & Kincho H. Law. (2016) A computational framework for regional seismic simulation of buildings with multiple fidelity models. Advances in Engineering Software 99, pages 100-110.
Crossref

Yankai Cao, Arpan Seth & Carl D. Laird. (2016) An augmented Lagrangian interior-point approach for large-scale NLP problems on graphics processing units. Computers & Chemical Engineering 85, pages 76-83.
Crossref

Kai He, Sheldon X.-D. Tan, Hengyang Zhao, Xue-Xin Liu, Hai Wang & Guoyong Shi. (2016) Parallel GMRES solver for fast analysis of large linear dynamic systems on GPU platforms. Integration 52, pages 10-22.
Crossref

Gloria Ortega, Julia Lobera, Inmaculada García, M. Pilar Arroyo & Ester M. Garzón. (2015) Parallel resolution of the 3D Helmholtz equation based on multi-graphics processing unit clusters. Concurrency and Computation: Practice and Experience 27:13, pages 3205-3219.
Crossref

Wangdong Yang, Kenli Li, Zeyao Mo & Keqin Li. (2015) Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs. IEEE Transactions on Computers 64:9, pages 2623-2636.
Crossref

Jiasen Huang, Weina Lu & Junyan Ren. (2015) Greedy approach based heuristics for partitioning SpMxV on FPGAs. Greedy approach based heuristics for partitioning SpMxV on FPGAs.

Hui Huang, Xujie Li, Hanli Zhao, Guizhi Nie, Zhongyi Hu & Lei Xiao. (2014) Manifold-preserving image colorization with nonlocal estimation. Multimedia Tools and Applications 74:18, pages 7555-7568.
Crossref

Motahar Reza, Aman Sinha, Rajkumar Nag & Prasant Mohanty. (2015) CUDA-enabled Hadoop cluster for Sparse Matrix Vector Multiplication. CUDA-enabled Hadoop cluster for Sparse Matrix Vector Multiplication.

J. Wong, E. Kuhl & E. Darve. (2015) A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems. International Journal for Numerical Methods in Engineering 102:12, pages 1784-1814.
Crossref

Han-Li Zhao, Gui-Zhi Nie, Xu-Jie Li, Xiao-Gang Jin & Zhi-Geng Pan. (2015) Structure-Aware Nonlocal Optimization Framework for Image Colorization. Journal of Computer Science and Technology 30:3, pages 478-488.
Crossref

Jiasen HUANG, Junyan REN & Wei LI. (2015) Greedy Approach Based Heuristics for Partitioning Sparse Matrices. IEICE Transactions on Information and Systems E98.D:10, pages 1847-1851.
Crossref

I. K. Beisembetov, T. T. Bekibaev, B. K. Assilbekov, U. K. Zhapbasbayev & B. K. Kenzhaliev. HIGH-PERFORMANCE COMPUTING IN OIL RECOVERY SIMULATION BASED ON CUDA. HIGH-PERFORMANCE COMPUTING IN OIL RECOVERY SIMULATION BASED ON CUDA.

Yan ZhouTian NanYa Li CuiTang Pei Cheng & Jing Li Shao. (2014) Numerical Simulation of Groundwater Flow Based on CUDA. Applied Mechanics and Materials 556-562, pages 3527-3531.
Crossref

Yun Fei, Guodong Rong, Bin Wang & Wenping Wang. (2014) Parallel L-BFGS-B algorithm on GPU. Computers & Graphics 40, pages 1-9.
Crossref

Jan Bender, Kenny Erleben & Jeff Trinkle. (2014) Interactive Simulation of Rigid Body Dynamics in Computer Graphics. Computer Graphics Forum 33:1, pages 246-270.
Crossref

Hadrien Courtecuisse, Jérémie Allard, Pierre Kerfriden, Stéphane P.A. Bordas, Stéphane Cotin & Christian Duriez. (2014) Real-time simulation of contact and cutting of heterogeneous soft-tissues. Medical Image Analysis 18:2, pages 394-410.
Crossref

Jiaquan Gao, Ronghua Liang & Jun Wang. (2014) Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU. Journal of Parallel and Distributed Computing 74:2, pages 2088-2098.
Crossref

Zhisong Fu, T. James Lewis, Robert M. Kirby & Ross T. Whitaker. (2014) Architecting the finite element method pipeline for the GPU. Journal of Computational and Applied Mathematics 257, pages 195-211.
Crossref

Jun Tao, Chaoli Wang, Ching-Kuang Shene & Seung Hyun Kim. (2014) A Deformation Framework for Focus+Context Flow Visualization. IEEE Transactions on Visualization and Computer Graphics 20:1, pages 42-55.
Crossref

Xiaowen Feng, Hai Jin, Ran Zheng, Zhiyuan Shao & Lei Zhu. (2014) A segment-based sparse matrix-vector multiplication on CUDA. Concurrency and Computation: Practice and Experience 26:1, pages 271-286.
Crossref

A.V. Gorobets, F.X. Trias & A. Oliva. (2013) A parallel MPI+OpenMP+OpenCL algorithm for hybrid supercomputations of incompressible flows. Computers & Fluids 88, pages 764-772.
Crossref

Hoang-Vu Dang & Bertil Schmidt. (2013) CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations. Parallel Computing 39:11, pages 737-750.
Crossref

Xi Chen, Yuxin Jie & Yuzhen Yu. (2013) GPU-accelerated iterative solutions for finite element analysis of soil–structure interaction problems. Computational Geosciences 17:4, pages 723-738.
Crossref

S. P. Vanka. (2013) 2012 Freeman Scholar Lecture: Computational Fluid Dynamics on Graphics Processing Units. Journal of Fluids Engineering 135:6.
Crossref

Diego Rodriguez-Losada, Pablo San Segundo, Miguel Hernando, Paloma de la Puente & Alberto Valero-Gomez. (2013) GPU-Mapping: Robotic Map Building with Graphical Multiprocessors. IEEE Robotics & Automation Magazine 20:2, pages 40-51.
Crossref

Shiming Xu, Wei Xue & Hai Xiang Lin. (2011) Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform. The Journal of Supercomputing 63:3, pages 710-721.
Crossref

Yiannis Andreopoulos. (2013) Error Tolerant Multimedia Stream Processing: There's Plenty of Room at the Top (of the System Stack). IEEE Transactions on Multimedia 15:2, pages 291-303.
Crossref

Jiaquan Gao, Bo Li & Guixia He. 2013. Network and Parallel Computing. Network and Parallel Computing 298 307 .

V. Galiano, H. Migallón, V. Migallón & J. Penadés. (2012) GPU-based parallel algorithms for sparse nonlinear systems. Journal of Parallel and Distributed Computing 72:9, pages 1098-1105.
Crossref

Rudi Helfenstein & Jonas Koko. (2012) Parallel preconditioned conjugate gradient algorithm on GPU. Journal of Computational and Applied Mathematics 236:15, pages 3584-3590.
Crossref

G. A. Gravvanis, C. K. Filelis-Papadopoulos & K. M. Giannoutakis. (2011) Solving finite difference linear systems on GPUs: CUDA based Parallel Explicit Preconditioned Biconjugate Conjugate Gradient type Methods. The Journal of Supercomputing 61:3, pages 590-604.
Crossref

Florian Ries, Tommaso De Marco & Roberto Guerrieri. (2012) Tuning solution of large non-Hermitian linear systems on multiple graphics processing unit accelerated workstations. The International Journal of High Performance Computing Applications 26:3, pages 296-309.
Crossref

Kun-Chuan Feng, Chaoli Wang, Han-Wei Shen & Tong-Yee Lee. (2012) Coherent Time-Varying Graph Drawing with Multifocus+Context Interaction. IEEE Transactions on Visualization and Computer Graphics 18:8, pages 1330-1342.
Crossref

Francisco Vázquez, José Jesús Fernández & Ester M. Garzón. (2012) Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Parallel Computing 38:8, pages 408-420.
Crossref

Vahid Jalili-Marandi, Zhiyin Zhou & V Dinavahi. (2012) Large-Scale Transient Stability Simulation of Electrical Power Systems on Parallel GPUs. IEEE Transactions on Parallel and Distributed Systems 23:7, pages 1255-1266.
Crossref

V. Jalili-Marandi, Zhiyin Zhou & V. Dinavahi. (2012) Large-scale transient stability simulation of electrical power systems on parallel GPUs. Large-scale transient stability simulation of electrical power systems on parallel GPUs.

F. V'zquez, G. Ortega, J.J. Fern'ndez, I. Garcia & E.M. Garzon. (2012) Fast Sparse Matrix Matrix Product Based on ELLR-T and GPU Computing. Fast Sparse Matrix Matrix Product Based on ELLR-T and GPU Computing.

Keliang Zhang & Baifeng Wu. (2012) Parallel Sparse Matrix Multiplication for Preconditioning and SSTA on a Many-Core Architecture. Parallel Sparse Matrix Multiplication for Preconditioning and SSTA on a Many-Core Architecture.

Jérémie Allard, Hadrien Courtecuisse & François Faure. 2012. GPU Computing Gems Jade Edition. GPU Computing Gems Jade Edition 281 294 .

François Faure, Christian Duriez, Hervé Delingette, Jérémie Allard, Benjamin Gilles, Stéphanie Marchesseau, Hugo Talbot, Hadrien Courtecuisse, Guillaume Bousquet, Igor Peterlik & Stéphane Cotin. 2012. Soft Tissue Biomechanical Modeling for Computer Assisted Surgery. Soft Tissue Biomechanical Modeling for Computer Assisted Surgery 283 321 .

Miriam Leeser, Devon Yablonski, Dana Brooks & Laurie Smith King. (2011) The challenges of writing portable, correct and high performance libraries for GPUs. ACM SIGARCH Computer Architecture News 39:4, pages 2-7.
Crossref

Abu Sayed Md. Mostafizur Rahaman, Jesmin Akhter & Mohammad Touhidur Rahman. (2011) Parallel commutation of sparse linear systems on many core processor. Parallel commutation of sparse linear systems on many core processor.

Zhiyong Huang, Fazhi He, Xiantao Cai, Yuan Chen & Xiao Chen. (2011) A 2D-3D Hybrid Approach to Video Stabilization. A 2D-3D Hybrid Approach to Video Stabilization.

Xujie Li, Hanli Zhao, Xiaogang Jin & Xiaochun Qin. (2011) Real-Time Image Smoothing Based on True Edges. Real-Time Image Smoothing Based on True Edges.

Mark J. Stock & Adrin Gharakhani. (2011) Graphics Processing Unit-Accelerated Boundary Element Method and Vortex Particle Method. Journal of Aerospace Computing, Information, and Communication 8:7, pages 224-236.
Crossref

F. Vázquez, J. J. Fernández & E. M. Garzón. (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurrency and Computation: Practice and Experience 23:8, pages 815-826.
Crossref

Hector Klie, Hari Sudan, Ruipeng Li & Yousef Saad. (2011) Exploiting Capabilities of Many Core Platforms in Reservoir Simulation. Exploiting Capabilities of Many Core Platforms in Reservoir Simulation.

Yu-Shuen Wang, Chaoli Wang, Tong-Yee Lee & Kwan-Liu Ma. (2011) Feature-Preserving Volume Data Reduction and Focus+Context Visualization. IEEE Transactions on Visualization and Computer Graphics 17:2, pages 171-181.
Crossref

Xintian Yang, Srinivasan Parthasarathy & P. Sadayappan. (2011) Fast sparse matrix-vector multiplication on GPUs. Proceedings of the VLDB Endowment 4:4, pages 231-242.
Crossref

Yue Zhuo, Xiao-Long Wu, Justin P. Haldar, Thibault Marin, Wen-mei W. Hwu, Zhi-Pei Liang & Bradley P. Sutton. 2011. GPU Computing Gems Emerald Edition. GPU Computing Gems Emerald Edition 709 722 .

Danilo De Donno, Alessandra Esposito, Giuseppina Monti & Luciano Tarricone. 2011. Euro-Par 2010 Parallel Processing Workshops. Euro-Par 2010 Parallel Processing Workshops 329 337 .

G. Knittel. (2010) A CG-based Poisson solver on a GPU-cluster. A CG-based Poisson solver on a GPU-cluster.

Mario Botsch, Leif Kobbelt, Mark Pauly, Pierre Alliez & Bruno Levy. 2010. Polygon Mesh Processing. Polygon Mesh Processing 203 225 .

Wei Cao, Lu Yao, Zongzhe Li, Yongxian Wang & Zhenghua Wang. (2010) Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format. Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format.

Maxime R Hugues & Serge G Petiton. (2010) Sparse Matrix Formats Evaluation and Optimization on a GPU. Sparse Matrix Formats Evaluation and Optimization on a GPU.

Yu-Shuen Wang, Hui-Chih Lin, Olga Sorkine & Tong-Yee Lee. (2010) Motion-based video retargeting with optimized crop-and-warp. ACM Transactions on Graphics 29:4, pages 1-9.
Crossref

Mark Stock & Adrin Gharakhani. (2010) A GPU-Accelerated Boundary Element Method and Vortex Particle Method. A GPU-Accelerated Boundary Element Method and Vortex Particle Method.

Yun Zeng, Chaohui Wang, Yang Wang, Xianfeng Gu, Dimitris Samaras & Nikos Paragios. (2010) Dense non-rigid surface registration using high-order graph matching. Dense non-rigid surface registration using high-order graph matching.

F. Vázquez, G. Ortega, J.J. Fernández & E.M. Garzón. (2010) Improving the Performance of the Sparse Matrix Vector Product with GPUs. Improving the Performance of the Sparse Matrix Vector Product with GPUs.

Juyong Zhang, Chunlin Wu, Jianfei Cai, Jianmin Zheng & Xue‐cheng Tai. (2010) Mesh Snapping: Robust Interactive Mesh Cutting Using Fast Geodesic Curvature Flow. Computer Graphics Forum 29:2, pages 517-526.
Crossref

Abhijeet Gaikwad & Ioane Muni Toke. (2010) Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case. Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case.

Marco Ament, Gunter Knittel, Daniel Weiskopf & Wolfgang Strasser. (2010) A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform. A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform.

Florian Stock & Andreas Koch. 2010. Parallel Processing and Applied Mathematics. Parallel Processing and Applied Mathematics 457 466 .

Wilson W. L. Fung, Ivan Sham, George Yuan & Tor M. Aamodt. (2009) Dynamic warp formation. ACM Transactions on Architecture and Code Optimization 6:2, pages 1-37.
Crossref

Hadrien Courtecuisse & Jérémie Allard. (2009) Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors. Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors.

Shengjun Liu & Charlie C.L. Wang. (2009) Duplex fitting of zero-level and offset surfaces. Computer-Aided Design 41:4, pages 268-281.
Crossref

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Concurrent number cruncher: a GPU implementation of a general sparse linear solver

Articles from other publishers (81)

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Concurrent number cruncher: a GPU implementation of a general sparse linear solver

Citations (82)

Read on this site (1)

Articles from other publishers (81)

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date