Study at TCU


Name ARAI Shuichi
Official Title Professor
Affiliation Computer Science, Information Technology
Profile My research fields are Artificial Intelligence, Machine Learning and Signal Processing. My main research interests are Knowledge Acquisition from multimodal signals. Representative examples of some studies are introduced below.
(1) Language acquisition from the multimodal sensors such as microphones and cameras.
How babies learn their native language from their sensors such as hearing and vision, and how linguistic symbols are grounded through multimodal interactions in the community. We have proposed several types of models to allow computers to capture word concepts of nouns and adjectives using a multi-agent framework that does not include any symbols in this model. We are currently researching primitive level agents that can replace deep neural networks.
(2) Audio chord estimation from music signals, Sequence of audio chords estimation
How humans listen to music and what humans get/feel from music. Nowadays, computers can recognize the music title in the way of computers that is called the music finger-print. However, this way of listening is very different from human’s way. When we listen to music, we can feel three primary element as follows, melody, harmony and rhythm. So these three elements are very primary and important to “listening music”. These primary features need to be used by computers to recognize music like humans. Over the last several years, we have proposed some automatic audio chord estimators with deep neural network to achieve state-of-the-art performance. Currently, we are studying the rich musical features for inputting to neural networks. Using this abundant features, we contrive to let the neural network learn the musical chord by focusing and/or selecting the politic features. Using this abundant feature, we are trying to get the neural network to learn the musical chords by focusing and/or selecting only the appropriate features.
(3) Semantic Image Segmentation
Segmenting objects from the scene is the most basic and important behavior of humans living in this world.
We humans assume that visual information from our eyes can be used to segment objects with detailed contours. However, in reality, our brain only creates a beautifully segmented map in the brain, and the actual visual information has a considerably low resolution. So we have proposed the adaptive focusing neural module for visual cortex, and realized high performance for accurately segmenting objects.
Research Field(Keyword & Summary)
  1. Audio Chord Estimation

    This study aims to extract or transcribe a sequence of chords from an audio music recording. For many applications in music information retrieval, extracting the harmonic structure of an audio track is very desirable. Recently, various methods using deep learning have been proposed. Most of these methods use traditional features such as CQT and DFT, even though the musical features consist of both logarithmic fundamentals and linear overtones. This study employs the rich features including both logarithmic and linear frequency features, then let the deep neural network select the effective feature to estimate chords. This combination of the redundant features and deep learning achieved the state-of-the-art performance.

  2. Semantic Segmentation of Image

    Semantic segmentation is one of the key computer vision tasks which performs the pixel-wise object classification. In these days, the deep learning with CNNs has become the standard when dealing with Semantic segmentation. CNNs is basically very powerful but the visual cortex has the fixed square shape, so many layers should be required to segment objects that have complex contour. CNNs, on the other hand, have a bottleneck structure to represent the highly abstract features of an object, which inevitably blurs the represented shape of the object. To solve these problems, we proposed the relatively shallow and multi-resolution network using deformable convolution. This method enables to segment not only large objects but also thin and small objects.

Representative Papers
  1. (1) "DNN-LSTM-CRF Model for Automatic Audio Chord Recognition", ACM PRAI, 2018.
  2. (2) "Deep Convolutional Encoder-Decoder Network with Model Uncertainty for Semantic Segmentation, IEEE INISTA, 2017
  3. (3) "Inference with Model Uncertainty on Indoor Scene for Semantic Segmentation", IEEE GlobalSIP, 2017.
  4. (4) "Fast Gabor Wavelet Transform Based on Synthesis of Gabor Spectrum using Convolution of Gaussian", SampTA, 2015.
  5. (5) "Fast Multiresolution Gabor Transform Based on Synthesis of High Frequency Resolution Spectrum from Low Frequency Resolution Spectra", IEEE GlobalSIP, 2014.
  6. (6) "Complexity Reduction of Continuous Wavelet Transform with Gabor Basis Functions by Convolving Wavelet Coefficient with Wavelet Function", IEICE Trans. On Information and Systems, Vol.J97-D, No.6, 2014.
  7. (7) "Complexity Reduction of Gabor Filtering using Discrete Convolution of Gaussian", IEICE Trans. On Information and Systems, Vol.J97-D, No.1, 2014.
  8. (8) "Complexity Reduction of Adaptive Window Analysis by Synthesis of Long-Term Windowed Spectrum from Short-Term Windowed Spectra with 0-padding", IEICE Trans. On Information and Systems, Vol.J96-D, No.9, 2013.
  9. (9) "Synthesis of Gabor Filtered Signal by Convolving Low-Q Filtered Signal with Gaussian Function for Efficient Computation", IEEE GlobalSIP, 2013.
  10. (10) "Text CAPTCHA by Indefinite Interval Sampling with Circle Packing", Trans. On Information and Systems, Vol. J94-D, No.4, 2011.
Grant-in-Aid for Scientific Research Support: Japan Society for Promotion of Science (JSPS)
Affiliated academic society (Membership type) (1) IEEE (member)
(2) IEICE (member)
(3) IPSJ (member)
(4) ASJ (member)
Education Field (Undergraduate level) Digital signal processing, Pattern Recognition, Speech processing
Education Field (Graduate level) Advanced Pattern processing