AMAAI Webinar http://dorienherremans.com/webinars
Webinar on nnAudio by Yin-Jyun Luo, SUTD
Title: Disentangled Representation Learning Using Gaussian-Mixture Variational Auto-encoders: Applications for Synthesis and Conversion of Musical Signals
Abstract: Disentangled representation learning aims to uncover generative factors of variation of data. This could enable analysis of interpretable features and synthesis of novel data. In the context of deep learning, variational auto-encoders (VAEs) are one of the most popular frameworks for learning disentangled representations. VAEs describe a data-generating process that first samples a latent variable from a prior distribution, and samples an observation from a distribution conditioned on the latent variable. Training VAEs thus captures disentangled representations with the latent variable. In this talk, we present a VAE that learns significant factors of variation for either isolated musical instrument sounds or expressive singing voices [1, 2]. In particular, we exploit Gaussian-mixture prior distribution for the latent variables of interest, thereby capturing multi-modality of the data. We verify and demonstrate the model's capability of controllable attribute synthesis and conversion.
[1] Yin-Jyun Luo, Kat Agres, Dorien Herremans. "Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders," ISMIR 2019. https://dorienherremans.com/sites/default/files/1912.02613_0.pdf
[2] Yin-Jyun Luo, Chin-Cheng Hsu, Kat Agres, Dorien Herremans, "Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders," ICASSP 2020. https://dorienherremans.com/sites/default/files/jyun-ismir.pdf
Webinar on nnAudio by Yin-Jyun Luo, SUTD
Title: Disentangled Representation Learning Using Gaussian-Mixture Variational Auto-encoders: Applications for Synthesis and Conversion of Musical Signals
Abstract: Disentangled representation learning aims to uncover generative factors of variation of data. This could enable analysis of interpretable features and synthesis of novel data. In the context of deep learning, variational auto-encoders (VAEs) are one of the most popular frameworks for learning disentangled representations. VAEs describe a data-generating process that first samples a latent variable from a prior distribution, and samples an observation from a distribution conditioned on the latent variable. Training VAEs thus captures disentangled representations with the latent variable. In this talk, we present a VAE that learns significant factors of variation for either isolated musical instrument sounds or expressive singing voices [1, 2]. In particular, we exploit Gaussian-mixture prior distribution for the latent variables of interest, thereby capturing multi-modality of the data. We verify and demonstrate the model's capability of controllable attribute synthesis and conversion.
[1] Yin-Jyun Luo, Kat Agres, Dorien Herremans. "Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders," ISMIR 2019. https://dorienherremans.com/sites/default/files/1912.02613_0.pdf
[2] Yin-Jyun Luo, Chin-Cheng Hsu, Kat Agres, Dorien Herremans, "Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders," ICASSP 2020. https://dorienherremans.com/sites/default/files/jyun-ismir.pdf
Category
📚
Learning