Kazushige Goto Joins TACC

A major contributor to high-performance computing has joined the staff of the Texas Advanced Computing Center (TACC) at The University of Texas at Austin (UT Austin). The center announced today that Kazushige Goto, known worldwide for his versions of the Basic Linear Algebra Subprograms (BLAS, pronounced "blahs"), optimized for specific computer processor architectures, will move from Japan to Austin in December to become part of the TACC High Performance Computing group. "Kazushige has made extraordinary contributions to the HPC field by providing the highest performance BLAS libraries for the leading architectures, and we are very excited that he is joining TACC in order to make many more contributions to high-performance linear algebra," said Dr. Jay Boisseau, Director of TACC. Goto has attained widespread recognition over the past three years for his computer codes, highly optimized linear algebra "kernels." These make up his BLAS libraries, which resemble other general BLAS libraries with one important exception: each of Goto's BLAS libraries, targeted to a particular computer architecture, has the swiftest routines. Goto's BLAS are optimized for specific chips, such as the Intel Itanium 2, the IBM Power 4, or the AMD Opteron, as are some other advanced libraries-but Goto's are the fastest. "Kazushige Goto's reputation is well known in our community. His work with the linear algebra libraries that underlie a very large portion of all scientific and engineering computation will be of benefit all across high-performance computing," says Ralph Roskies, Scientific Director of the Pittsburgh Supercomputing Center. "Those of us who took high-school algebra might recall how to solve systems of two or three linear equations via 'Gaussian elimination.' Those are simple examples of linear algebra problems," Goto says. "Only the cruelest math teacher would make students solve systems with as many as four linear equations. For anything larger, we need computers. Almost all computational problems in science and engineering reduce at some level to the solution of such systems-usually with hundreds, thousands, or even millions of equations. To solve them quickly, we use the fastest algorithms, and that's where fast kernels come in." Nearly all major computer vendors and at least one well-known academic project have active efforts to implement basic linear algebra kernels on the latest computer architectures, so Goto's implementations are widely regarded as the standard by which such efforts must be measured. In fact, most of the fastest machines in the world (ranked on the popular TOP500 list) use his libraries to obtain high performance on the benchmark used for the ranking. Goto is no stranger to Texas. In 2002-2003, he was a visiting researcher in the laboratory of Dr. Robert van de Geijn, professor of computer sciences at UT Austin, member of the Institute for Computational and Engineering Sciences, and active TACC user and research collaborator. "Kazushige's kernels have set a new standard of excellence across both industry and academia," says van de Geijn. "What impresses me as an academic is not only his exemplary engineering insight but also his scientific insight into the algorithms that drive these kernels. His implementations are steps toward proof of his conjectures about the algorithms themselves, and their speed is simply a bonus. Our group looks forward to working with him on a wide array of projects involving such libraries, with the aim of setting a similar high standard." Van de Geijn's enthusiasm matches that of Dr. Mark Seager of the Lawrence Livermore National Laboratory (LLNL). Seager, director of the Platforms program in the Advanced Simulation and Computing effort at LLNL, has also worked with Goto. "I am delighted to hear that Kazushige Goto is coming back to Texas to work at TACC," he says, "because he's one of the few people on the planet who really understands the intricacies of modern microprocessor hierarchical memory systems. His unique talents allowed him to improve the performance of critical linear algebra operations dramatically, on every microprocessor he has looked into. His key architectural insights permit commodity microprocessors to attain vector-computer-like efficiencies. He is a world-class resource, and TACC is extremely lucky and privileged to have him as a member of the technical staff."