This is a personal website introducing my research activities. Please feel free to contact me on my e-mail address, or LinkedIn.
Short Bio
Dr. Yotaro Kubo received the B.E., M.E., and Dr.Eng. degrees from Waseda University, Tokyo, Japan, in 2007, 2008, and 2010, respectively. He was a visiting scientist at RWTH Aachen University for six monthes in 2010. After that period, he joined Nippon Telegraph and Telephone Corporation (NTT) and had been with NTT Communication Science Laboratories. From 2014 to 2019, he was with Amazon (at Aachen, Germany) and developed/ investigated speech recognition for voice search and personal assistants. Since 2019, he is a research scientist at Google (at Tokyo, Japan). His research interest includes generative/ discriminative hybrid modeling, kernel-based probabilistic models, and integration of probabilistic systems. He is a member of the IEEE, the International Speech Communication Association (ISCA), and the Acoustical Society of Japan (ASJ).
Research Interests
- Machine Learning for Speech Signal and Spoken Language Processing
- Generative/ Discriminative-hybrid training of hidden Markov models
- Flat-direct classifiers for automatic speech recognition enhanced by using nonlinear feature transformation
- Deep learning with discrete parameters/ variables for network structure estimation
- Software architecture for efficient research
Publications
Refereed Journal Papers
Y. Kubo , S. Watanabe, T. Hori, A. Nakamura "Structural Classification Methods based on Weighted Finite-State Transducers for Automatic Speech Recognition," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, No. 8, pp. 2240 - 2251, Oct 2012. (IEEExplorer)Y. Kubo , S. Okawa, A. Kurematsu, K. Shirai, "Temporal AM-FM Combination for Robust Speech Recognition," Speech Communication, Vol. 54, No. 5, pp. 716-725, May 2011.(Science Direct)Y. Kubo , S. Watanabe, A. Nakamura, E. McDermott, T. Kobayashi, "A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Classification," IEEE Journal of Selected Topics in Signal Processing, Vol. 4, No. 6, pp. 974-984, December 2010. (IEEExplorer)Y. Kubo , S. Okawa, A. Kurematsu, K. Shirai, "Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation," IEICE Trans. on Inf. and Syst., Vol. E-61-D, No., pp. 448-456, March 2008.Y. Kubo , M. Honda, K. Shirai, T. Komori, S. Nobumasa, T. Takagi, "An Improved High-quality MPEG-2/4 Advanced Audio Coding Encoder," Acoustical Science & Technology, Vol. 29, No. 6, pp. 362-371, December 2008. (Full Text via JStage)- M. Delcroix, K. Kinoshita, T. Naktani, S. Araki, A. Ogawa, T. Hori, S. Watanabe, M. Fujimoto, T. Yoshioka, T. Oba,
Y. Kubo , M. Souden, S.-J. Hahm, A. Nakamura, "Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral & temporal modeling of sounds," Computer Speech and Language, Vol. 27, No. 3, pp. 851-873, Elsevier. (Science Direct) - M. Delcroix, T. Yoshioka, A. Ogawa,
Y. Kubo , M. Fujimoto, N. Ito, K. Kinoshita, M. Espi, S. Araki, T. Hori, T. Nakatani, "Strategies for distant speech recognitionin reverberant environments," EURASIP Journal on Advances in Signal Processing, (2015) 2015: 60. https://doi.org/10.1186/s13634-015-0245-7.
Refereed Conference/ Workshop Papers
Y. Kubo , S. Karita, M. Bacchiani, "Knowledge Transfer from Large-Scale Pretrained Language Models to End-to-end Speech Recognizers," Proc. ICASSP-2022, Singapore, May 2022.Y. Kubo , M. Bacchiani, "Joint Phoneme-Grapheme Model for End-to-end Speech Recognition," Proc. ICASSP-2020, Barcelona, Spain, May 2020.Y. Kubo , G. Tucker, S. Wiesler, "Compacting Neural Network Classifiers via Dropout Training," Proc. NIPS Workshop on Efficient Methods for Deep Neural Networks, Barcelona, Spain, Dec 2016. (ArXiv)Y. Kubo , J. Suzuki, T. Hori, A. Nakamura, "Restructuring Output Layers of Deep Neural Networks Using Minimum Risk Parameter Clustering," Proc. Interspeech 2014, Singapore, Sept 2014. [pdf]Y. Kubo , T. Hori, A. Nakamura, "A Method for Structure Estimation of Weighted Finite-State Transducers and Its Application To Grapheme-to-Phoneme Conversion," Proc. Interspeech 2013, Lyon, France, August 2013.Y. Kubo , T. Hori, A. Nakamura, "Large Vocabulary Continuous Speech Recognition Based on WFST Structured Classifiers and Deep Bottleneck Features," Proc. ICASSP 2013, Vancouver, Canada, May 2013. [pdf]Y. Kubo , T. Hori, A. Nakamura, "Integrating Deep Neural Networks into Structured Classification Approach based on Weighted Finite-State Transducers," Proc. INTERSPEECH 2012, Portland, Oregon, U.S., September 2012. [pdf]Y. Kubo , S. Watanabe, A. Nakamura, "Decoding Network Optimization Using Minimum Transition Error Training," Proc. ICASSP 2012, Kyoto, Japan, pp. 4197-4200, March 2012. [pdf]Y. Kubo , S. Watanabe, A. Nakamura, S. Wiesler, R. Schlueter, H. Ney, "Basis Vector Orthogonalization for an Improved Kernel Gradient Matching Pursuit Method," Proc. ICASSP 2012, Kyoto, Japan, pp. 1909-1912, March 2012. [pdf]Y. Kubo , S. Wiesler, R. Schlueter, H. Ney, S. Watanabe, A. Nakamura, T. Kobayashi "Subspace Pursuit Method for Kernel-Log-Linear Models," Proc. ICASSP 2011, Prague, Czech, May 2011. [pdf]Y. Kubo , S. Watanabe, A. Nakamura, T. Kobayashi, "A Regularized Discriminative Training Method of Acoustic Models Derived by Minimum Relative Entropy Discrimination," Proc. INTERSPEECH-2010, Makuhari, Japan, September 2010. [pdf]Y. Kubo , S. Okawa, A. Kurematsu, K. Shirai, "A Comparative Study on AM and FM Features," Proc. Interspeech-2008, Brisbane, September 2008. [pdf]Y. Kubo , S. Okawa, A. Kurematsu, K. Shirai, "Independent Feature Selection Algorithms for the Creation of Multistream Speech Recognizers," Proc. ITRW on Speech Analysis and Processing for Knowledge Discovery, Aalborg, June 2008.Y. Kubo , S. Okawa, A. Kurematsu, K. Shirai, "Noisy Speech Recognition Using Temporal AM-FM Combination," Proc. ICASSP-2008, Las Vegas, pp. 4709-4712, April 2008. [pdf]Y. Kubo , S. Okawa, A. Kurematsu, K. Shirai, "A Study on Temporal Features Derived by Analytic Signal," Proc. Interspeech-2007, Antwerpen, pp. 1130-1133, September 2007. [pdf]- S. Karita,
Y. Kubo , M. Bacchiani, L. Jones, "A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition," Proc. INTERSPEECH-2021, Brno, Czech (online presentation), Sept 2021. - M. Espi, M. Fujimoto,
Y. Kubo , T. Nakatani, "Spectrogram Patch Based Acoustic Event Detection and Classification in Speech Overlapping Conditions," Proc. HSCMA, Nancy, France, May 2014. - M. Delcroix, T. Yoshioka, A. Ogawa,
Y. Kubo , M. Fujimoto, N. Ito, K. Kinoshita, M. Espi, T. Hori, T. Nakatani, A. Nakamura, "Linear Prediction-Based Dereverberation With Advanced Speech Enhancement and Recognition Technologies for The Reverb Challenge," Proc. REVERB Workshop, Florence, Italy, May 2014. - M. Fujimoto,
Y. Kubo , T. Nakatani, "Unsupervised non-parametric Bayesian modeling of non-stationary noise for model-based noise suppression," Proc. ICASSP 2014, Frolence, Italy, May 2014. - T. Hori,
Y. Kubo , A. Nakamura, "Real-time one-pass decoding with recurrent neural network language model for speech recognition," Proc. ICASSP 2014, Frolence, Italy, May 2014. - M. Blondel,
Y. Kubo , N. Ueda, "Online Passive-Aggressive Algorithms for Non-Negative Matrix Factorization and Completion," Proc. AISTATS 2014, Reykjavik, Iceland, April 2014. - M. Delcroix,
Y. Kubo , T. Nakatani, A. Nakamura, "Is Speech Enhancement Pre-Processing Still Relevant When Using Deep Neural Networks for Acoustic Modeling?" Proc. Interspeech 2013, Lyon, France, August 2013. - S. Watanabe,
Y. Kubo , T. Oba, T. Hori, A. Nakamura, "Bag of Arcs: New Representation of Speech Segment Features Based on Finite State Machines," Proc. ICASSP 2012, Kyoto, Kapan, pp. 4201-4204, March 2012. - M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, A. Ogawa, T. Hori, S. Watanabe, M. Fujimoto, T. Yoshioka, T. Oba,
Y. Kubo , M. Souden, S.-J. Hahm, A. Nakamura, "Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/ noise modeling combined with dynamic variance adaptation," Proc. CHiME (Computational Hearing in Multisource Environments) 2011, September 2011. - S. Wiesler, A. Richard,
Y. Kubo , R. Schlueter, H. Ney, "Feature Selection for Log-Linear Acoustic Models," Proc. ICASSP 2011, Prague, Czech, May 2011.
Theses
Y. Kubo , "Automatic Speech Recognition Based on Temporal Analysis of Amplitude and Frequency Modulation," Master of Informatics and Computer Science, Waseda University, 2008 (written in Japanese).Y. Kubo , "Regularized Discrimination of High-Dimensional Signal Representations for Automatic Speech Recognition," Doctor of Engineering, Waseda University, 2010 (pdf) (HTML)
Tutorial Articles (in Japanese)
Y. Kubo , "深層学習が支える音声認識技術]{深層学習が支える音声認識技術 (Automatic Speech Recognition Technologies Boosted by Deep Learning)," The Transactions of IEICE, May 2022 (to appear).Y. Kubo , "音声認識のための深層学習 (Deep Learning for Speech Recognition)," Journal of the Japanese Society for Artificial Intelligence, Vol. 29, No. 1, pp. 62-71, Jan 2014.- T. Hori, S. Araki,
Y. Kubo , A. Ogawa, T. Oba, A. Nakamura, "自然な会話を聞き取る音声認識技術 (Speech Recognition Technologies for Natural Conversation Scenes)," Nikkei Electronics, 2013.10.24, Oct 2013. Y. Kubo , A. Ogawa, T. Hori, A. Nakamura, "Speech Recognition Based on Unified Model of Acoustic and Language Aspects of Speech," NTT Technical Review, Vol.11, No.12. (English Version in "NTT Technical Review") or (Japanese Version in "NTT技術ジャーナル")Y. Kubo , "ディープラーニングによるパターン認識 (Deep Learning for Pattern Recognition)," IPSJ Magazine, Vol. 54, No. 5, pp. 500-508, May 2013 (IPSJ Digital Library).- K. Shirai, T. Kobayashi, M. Abe, K. Iwata, R. Imai, H. Kikuchi, K. Ohtsuki, H. Fujisawa, M. Honda, Y. Hayashi, K. Mano, T. Takezawa, S. Takahashi, S. Okawa, K. Hoashi, N. Masaki, N. Osaka, "音声言語処理の潮流 (The Tide of Spoken Language Processing)," Corona Publishing, Mar 2010 (I wrote explanations about tandem approach and neural networks).
Y. Kubo , "Mac OS Xのアプリケーション開発 (Developing Applications for Mac OS X)," UNIX Magazine 2006.03, Mar 2006.
Book
- S. Asoh, M. Yasuda, S. Maeda, D. Okanohara, T. Okatani,
Y. Kubo , D. Bollegala (Ed: T. Kamishima), "深層学習 --Deep Learning--," Kindai Kagakusya, Nov 2015. (written in Japanese; Amazon ; also available in Korean http://jpub.tistory.com/m/779) - Ed: Acoustical Society of Japan, Ed: Y. Haneda, Ed: S. Okawa, Ed: S. Kiya, "音響学入門ペディア -- Acousticpedia for Beginners --," CORONA Publishing, Mar, 2017. (written in Japanese; Amazon)
Y. Kubo (Ed: Acoustical Society of Japan) "機械学習による音声認識 -- Machine Learning in Automatic Speech Recognition --," CORONA Publishing, Apr, 2021. (written in Japanese; Amazon)
Domestic Workshop Papers (Not refereed; excerpt)
Academic Activities
- Member of IEEE (The Institute of Electrical and Electronics Engineers)
- Member of ISCA (International Speech Communication Association)
- Member of ASJ (The Acoustical Society of Japan)
- Reviewer of the following scientific journals
- IEEE Transaction on Signal Processing
- IEEE Transaction on Audio, Speech and Language Processing
- Speech Communication
- IEICE Transactions on Information and Systems
Talks/ Lectures
- "A Speech Recognition Toolkit based on Python", EuroSciPy-2010, Paris, France, July 2010.
- "An application method of minimum relative entropy discrimination for hidden Markov models," InterACT Talk (Karlsruhe University), Karlsruhe, Germany, Sep. 2010. (Host: Dr. Sebastian Stueker)
- "Subspace Pursuit Methods for Kernel-Log-Linear Models," in National Institute of Information and Communications Technology (NICT), Kyoto, Japan, Nov. 2011.
- "High-Dimensional Log-Linear Models for Automatic Speech Recognition," in Microsoft Research Asia, Beijing, China, Jan. 2012.
- "Python in Automatic Speech Recognition Research (音声認識研究におけるPython)," Tokyo.SciPy #004 (aka Kan.SciPy #001), Jun. 2012.
- "Automatic Recognition of Conversational Speech," in Microsoft Research Redmond, WA, USA, Sep. 2012. (with Dr. Seong-Jun Hahm)
- "Basics and Outlooks of Deep Learning (ディープラーニングの基礎と展望)," in ALAGIN Young Researchers' Workshop (Tokyo University), Tokyo, Japan, Dec. 2012.
- "Recent Developments in Deep Neural Networks for Automatic Speech Recognition: Methods and Applications," in Theme Workshop of FIRST Aihara Project (Tokyo University), Tokyo, Japan, Mar. 2013. (Host: Dr. Takaki Makino)
- "Recent developments in speech recognition technologies (音声認識技術の現在と最先端)," in Nara Institute of Science and Technology (NAIST), Nara, Japan, May. 2013. (in Japanese; Invited Lecture; Host: Dr. Graham Neubig)
- "Practical kernel methods for automatic speech recognition," in Mitsubishi Electric Research Laboratories, MA, USA, May. 2013. (Host: Dr. Shinji Watanabe)
- "WFST-based structured classification for meeting recognition," in SLS Seminar at Massachusetts Institute of Technology, MA, USA, May. 2013. (Host: Prof. James Glass)
- "Integration of structured classification and deep neural networks for automatic speech recognition," in Midwest Speech and Language Days (Toyota Technological Institute Chicago), IL, USA, May. 2013. (Invited Talk; Host: Prof. Sadaoki Furui)
- "Deep Learning and its application to Automatic Speech Recognition (Deep Learningとその音声認識への応用)" in Waseda University, Tokyo, Japan, Jun. 2013. (in Japanese; Host: Prof. Tetsunori Kobayashi)
- "Basics of Deep Learning and Speech Recognition (深層学習と音声認識の基本)" in ALAGIN Speech Processing Seminar, Tokyo, Japan, Oct. 2013. (in Japanese)
- "Recent studies on Deep Learning for Speech Recognition (音声認識分野における深層学習技術の研究動向)" The 16th Information-Based Induction Sciences Workshop, Tokyo, Japan, Nov. 2013. (in Japanese; Invited Talk)
- "Deep Learning (ディープラーニング技術)" in NHK Science & Technology Research Laboratories, Tokyo, Japan, Nov. 2013. (in Japanese; Host: Dr. Shoei Sato)
- "Deep Learning and Its Application to Pattern Recognition Problems (深層学習とそのパターン認識への応用)" in CS Colloquium at Tsukuba University, Ibaraki, Japan, Dec. 2013. (in Japanese; Host: Dr. Hideitsu Hino)
- "Applications of Deep Learning in Speech Recognition (音声認識における深層学習の活用とその進展)," invited talk in Ongaku-Symposium 2014
- "Advances in Speech Recognition for Digital Assistants," invited talk in Industry Forum in IEEE ICNC-2020.
- "Neural speech recognition," Tutorial in ISCA Speaker Odyssey Workshop 2020 (Co-organized with Shigeki Karita).
Degrees
- Doctor of Engineering from Waseda University, Tokyo, Japan (2010)
- Master of Informatics and Computer Science from Waseda University, Tokyo, Japan (2008)
- Bachelor of Informatics and Computer Science from Waseda University, Tokyo, Japan (2007)
Awards
- IEICE ISS Young Researcher's Award in Speech Field, 2013.
- The Itakura Award from the Acoustical Society of Japan (ASJ), 2013.
- IEEE SPS Japan Chapter Student Paper Award, 2011.
- The Yamashita SIG Research Award from the Information Processing Society of Japan (IPSJ), 2011.
- The Awaya Award from the Acoustical Society of Japan (ASJ), 2010.