Name IWANO Koji
Official Title Professor
Affiliation Information Systems, Informatics
Profile My main research interests lie in intelligent information processing of sound media including voice, music, and ambient sounds. Most of my research uses of machine learning-based pattern recognition techniques, such as speech recognition and speaker recognition. Specific research examples include development of intelligent conversational robots using speech recognition, development of cloud processing techniques of multi-speaker conversational speech, application of speaker verification for security, development of voice output devices for people with speech problems, proposals for new music interfaces based on the relationship between music and color perception, and so on. I am also conducting research on application systems using image recognition. For example, our laboratory proposes a system that can detect the appearance of wild birds and identify their bird species by using deep learning-based image recognition from images automatically captured in real environments. Research results are presented and published in academic conferences and papers.
Research Field(Keyword & Summary)
  1. (1) Speaker verification

    Speaker verification (voice-based person authentication) has been attracting attention as non-contact and easy means of person authentication. Toward the practical application of speaker verification systems, it is necessary to recognize the vulnerability of the systems against various attacks and to propose countermeasure methods against them. Our laboratory is experimentally revealing the vulnerability of speaker verification systems against “voice mimicry attacks” and investigating robust authentication methods against such attacks.

  2. (2) Processing multi-speaker conversational speech

    In our laboratory, we are developing methods for intelligently processing multi-speaker conversational speech collected by all participants recording their voices with their own smartphones. Specifically, we are proposing methods for "dialogue group detection" which estimates who and who interacted in the same group, and "speaker diarization" which identify who spoken when.

  3. (3) Application of image recognition

    Our laboratory aims to construct various information systems that are beneficial to the future of people and environment using image recognition technology. For example, we propose and develop a Convolutional Neural Network(CNN)-based system for automatically detecting birds appearing in “bird baths” installed with the aim of conserving wild birds and recognizing species of the detected birds to estimate the effectiveness of the bird baths.

Representative Papers
  1. (1) Error Correction Using Long Context Match for Smartphone Speech Recognition, IEICE Transactions on Information and Systems, Vol.E98-D, No.11, November 2015.
  2. (2) Detection of Overlapped Speech Using Lapel Microphones in Meeting, Speech Communication, Vol.55, No.10, November 2013.
  3. (3) Spectral Subtraction Based on Non-extensive Statistics for Speech Recognition, IEICE Transactions on Information and Systems, Vol.E96-D, No.8, August 2013.
  4. (4) Feature Normalization Based on Non-Extensive Statistics for Speech Recognition, Speech Communication, Vol.55, No.5, June 2013.
  5. (5) A Noise-Robust Speech Recognition Approach Incorporating Normalized Speech/Non-Speech Likelihood into Hypothesis Scores, Speech Communication, Vol.55, No.2, February 2013.
  6. (6) Differences between Acoustic Characteristics of Spontaneous and Read Speech and their Effects on Speech Recognition Performance, Computer Speech and Language, Vol.22, Iss.2, April 2008.
  7. (7) Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images, EURASIP Journal on Audio, Speech, and Music Processing, Vol.2007, Article ID 64506, March 2007.
  8. (8) New Approach to the Polyglot Speech Generation by Means of an HMM-Based Speaker Adaptable Synthesizer, Speech Communication,Vol.48, Iss.10, October 2006.
  9. (9) Analysis and Recognition of Spontaneous Speech Using Corpus of Spontaneous Japanese, Speech Communication, Vol.47, Iss.1-2, September 2005.
  10. (10)Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images, Journal of VLSI Signal Processing - Systems for Signal, Image, and Video Technology, Vol.36, Iss.2-3, February 2004.
Award (1) Best Presentation Award, The Eighth Symposium on Biometrics, Recognition and Authentication (SBRA) 2018
(2) IEICE-ISS Distinguished Reviewer Award 2017
Grant-in-Aid for Scientific Research Support: Japan Society for Promotion of Science (JSPS)
Recruitment of research assistant(s) No
Affiliated academic society (Membership type) (1) IEEE (Regular member)
(2) IEICE: the Institute of Electronics, Information and Communication Engineers (Senior member)
(3) IPSJ: Information Processing Society of Japan (Regular member)
(4) ASJ: Acoustical Society of Japan (Regular member)
(5) ISCA: International Speech Communication Association (Regular member)
Education Field (Undergraduate level) Multimedia Information Processing, Server System Construction, Server Operation, Computer Systems
Education Field (Graduate level) Intelligent Science, Information Technologies and Human Society