1. C. B. Moler and J. J. Dongarra, "EISPACK - A Package for Solving Matrix Eigenvalue Problems," in Sources and Development of Mathematical Software, Cowell, Ed., Engelwood Cliffs, NJ: Prentice Hall, 1984, pp. 68-87.
A description of the history, context, and significance of the EISPACK project. This was Dongarra’s introduction to the world of mathematical software.
2. J. J. Dongarra and G. W. Stewart, "LINPACK - A Package for Solving Linear Systems," in Sources and Development of Mathematical Software, Cowell, Ed., Engelwood Cliffs, NJ: Prentice Hall, 1984.
A thorough account of the development and early history of LINPACK, an important package of numerical software routines. This was the first major project led by Dongarra.
3. J. J. Dongarra, P. Luszczek & A. Petitet. (2003). "The LINPACK Benchmark: past, present, and future." Concurrency and Computation: Practice and Experience. 15: 803-820.
Describes the adoption of a benchmark based on LINPACK performance as the standard means of ranking supercomputer performance, a practice that persists long after LINPACK itself was replaced by newer approaches.
4. R. C. Whaley, A. Petitet & J. J. Dongarra (2001). "Automated Empirical Optimization of Software and the ATLAS Project" Parallel Computing. 27 (1–2): 3–35.
Described a new approach by which the BLAS subroutines (relied on by packages such as LINPACK and ScaLAPACK) could be optimized automatically for specific hardware platforms.
5. M Snir, W Gropp, S Otto, S Huss-Lederman, J Dongarra, D Walker. MPI--The Complete Reference: the MPI Core. Cambridge, MA: MIT Press, 1995.
The MPI standard, created in a process led by Dongarra, has been widely adopted as a framework for clustered and massively parallel supercomputing systems. It defines a set of basic routines that must be implemented for new hardware, allowing the creators of scientific applications and libraries to write high performance code that can be run across a vast range of systems and architectures.
6. J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari and J. Dongarra, "Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems)," SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006, pp. 50-50.
Explained how combining 32-bit and 64-bit floating point operations could sometimes retain the accuracy of pure 64-bit arithmetic while delivering substantial efficiency improvements so that jobs ran more quickly and needed less memory. This was possible because processor designers had introduced new architectural features that made 32-bit operations disproportionately fast.
7. Abdelfattah, Ahmad & Haidar, Azzam & Tomov, Stanimire & Dongarra, Jack. “Performance, Design, and Autotuning of Batched GEMM for GPUs,” in J. Kunkel, P. Balaji & J. J. Dongarra (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol 9697. Cham: Springer, 2016, pp. 21-38.
Another important example of the work of Dongarra’s group to optimize numerical computation for new supercomputing architectures, this time exploring optimization for systems based on processors designed for graphical rendering.