Kazuki Osawa / 大沢和樹

Google Scholar
GitHub
(last updated: 2023.04.28)

I am a Research Engineer at Google DeepMind. Previously, I was an ETH Postdoctoral Fellow at ETH Zurich, working with Prof. Torsten Hoefler in the Scalable Parallel Computing Laboratory. I received a Ph.D. in computer science at the Tokyo Institute of Technology where I was fortunate to be advised by Prof. Rio Yokota.

Publications

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias,
Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, and Kazuki Osawa,
International Conference on Machine Learning (ICML 2023).
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices,
Kazuki Osawa, Shigang Li, and Torsten Hoefler,
Sixth Conference on Machine Learning and Systems (MLSys 2023).
Neural Graph Databases,
Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, and Torsten Hoefler,
Learning on Graphs Conference (LoG 2022).
Efficient Quantized Sparse Matrix Operations on Tensor Cores,
Shigang Li, Kazuki Osawa, and Torsten Hoefler,
International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), best paper finalist.
Scalable and Practical Natural Gradient for Large-Scale Deep Learning,
Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, and Rio Yokota,
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 44, no. 1, pp. 404-415, 1 Jan. 2022.
Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks,
Ryo Karakida and Kazuki Osawa,
Advances in Neural Information Processing Systems (NeurIPS 2020), oral presentation. [video][code]
Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC,
Yuichiro Ueno, Kazuki Osawa, Yohei Tsuji, Akira Naruse, and Rio Yokota,
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2020).
Practical Deep Learning with Bayesian Principles,
Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, and Mohammad Emtiyaz Khan,
Advances in Neural Information Processing Systems (NeurIPS 2019) [poster][code]
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks,
Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, and Satoshi Matsuoka,
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) [poster][code]
Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method,
Yohei Tsuji, Kazuki Osawa, Yuichiro Ueno, Akira Naruse, Rio Yokota, and Satoshi Matsuoka,
The 48th International Conference on Parallel Processing: Workshops (ICPP 2019 Workshop)
Evaluating the Compression Efficiency of the Filters in Convolutional Neural Networks,
Kazuki Osawa and Rio Yokota,
Artificial Neural Networks and Machine Learning – ICANN 2017 , pp 459-466, Springer 2017.

Talks

Invited talk at the ISC High Performance 2021 Digital, July 1st, 2021
Second-order Optimizaiton for Large-scale Deep Learning (Distributed K-FAC for training ResNet-50 on ImageNet) [video][sildes]

Service

Served as a reviewer at Neural Networks (2021), NeurIPS 2021, ICLR 2022, NeurIPS 2022, ICLR 2023, NeurIPS 2023, and ICML 2023.
Selected as a Highlighted Reviewer at ICLR 2022 (top ~8%), Apr, 2022

Open Source

Automatic Second-order Differentiation Library (ASDL) [slides][paper]
A PyTorch extension for computing various metrics (Hessian, Jacobian, Fisher information matrix, gradient covariance, NTK, etc) and performing second-order optimization in deep learning.
chainerkfac
A Chainer extension for distributed K-FAC
PyTorch-SSO: Scalable Second-Order methods in PyTorch
A PyTorch extension for second-order optimization, Bayesian inference and distributed training
autograd-lib
A library to simplify gradient computations in PyTorch
dl-with-bayes
A code collection for Deep Learning with Bayesian Principles (variational inference) in PyTorch
ngd_in_wide_nn
JAX-/NumPy-based implementations of Natural Gradient Descent with exact/approximate Fisher information matrix in parameter-/function-space of finite-/infinite-width neural networks.

Kazuki Osawa / 大沢 和樹

Publications

Talks

Service

Open Source

Kazuki Osawa / 大沢和樹