I am a Research Engineer at Google DeepMind.
Previously, I was an ETH Postdoctoral Fellow at ETH Zurich, working with Prof. Torsten Hoefler in the Scalable Parallel Computing Laboratory.
I received a Ph.D. in computer science at the Tokyo Institute of Technology where I was fortunate to be advised by Prof. Rio Yokota.
Publications
-
Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias,
Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, and Kazuki Osawa,
International Conference on Machine Learning (ICML 2023).
-
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices,
Kazuki Osawa, Shigang Li, and Torsten Hoefler,
Sixth Conference on Machine Learning and Systems (MLSys 2023).
-
Neural Graph Databases,
Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, and Torsten Hoefler,
Learning on Graphs Conference (LoG 2022).
-
Efficient Quantized Sparse Matrix Operations on Tensor Cores,
Shigang Li, Kazuki Osawa, and Torsten Hoefler,
International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), best paper finalist.
-
Scalable and Practical Natural Gradient for Large-Scale Deep Learning,
Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, and Rio Yokota,
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 44, no. 1, pp. 404-415, 1 Jan. 2022.
-
Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks,
Ryo Karakida and Kazuki Osawa,
Advances in Neural Information Processing Systems (NeurIPS 2020), oral presentation.
[video][code]
-
Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC,
Yuichiro Ueno, Kazuki Osawa, Yohei Tsuji, Akira Naruse, and Rio Yokota,
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2020).
-
Practical Deep Learning with Bayesian Principles,
Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, and Mohammad Emtiyaz Khan,
Advances in Neural Information Processing Systems (NeurIPS 2019) [poster][code]
-
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks,
Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, and Satoshi Matsuoka,
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) [poster][code]
-
Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method,
Yohei Tsuji, Kazuki Osawa, Yuichiro Ueno, Akira Naruse, Rio Yokota, and Satoshi Matsuoka,
The 48th International Conference on Parallel Processing: Workshops (ICPP 2019 Workshop)
-
Evaluating the Compression Efficiency of the Filters in Convolutional Neural Networks,
Kazuki Osawa and Rio Yokota,
Artificial Neural Networks and Machine Learning – ICANN 2017 , pp 459-466, Springer 2017.
Talks
Service
-
Served as a reviewer at Neural Networks (2021), NeurIPS 2021, ICLR 2022, NeurIPS 2022, ICLR 2023, NeurIPS 2023, and ICML 2023.
-
Selected as a Highlighted Reviewer at ICLR 2022 (top ~8%), Apr, 2022
Open Source