Tutorial 11: Dictionary Learning for Sparse Representations: Algorithms and Applications

Monday, May 27, 2-5 pm

Presented by

Wei Dai, Wenwu Wang, Boris Mailhé


Sparse representation has shown its great potentials in signal processing and machine learning. As its performance heavily relies on the dictionary that is used for sparse approximation, to design or learn an appropriate dictionary becomes crucial. While pre-defined dictionaries have been widely adopted in many applications (for example, DCT and Wavelets dictionaries for image compression), learning a dictionary directly from data often leads to a better adaptation of the dictionary, and has been successful in the applications where pre-defined dictionaries are not available/applicable. This tutorial will present various algorithms for dictionary learning and the associated applications. Dictionary learning algorithms are often established on an optimization process involving the iteration between two stages: sparse approximation and dictionary update. The optimization formulation and algorithms for both stages will be discussed. The links and differences among these algorithms, such as the well-established maximum of optimal directions (MOD) and K-SVD, and the recently proposed Simultaneous Codeword Optimisation (SimCO), and Large step Gradient Descent (LGD) will be detailed. Extensions, for example, to analysis operator learning, will be addressed. Applications of dictionary learning, including image denoising, audio-visual joint tracking, and blind source separation, will be demonstrated, where learned dictionaries either outperform or have wider applicability than pre-defined dictionaries.

Part I: Dictionary learning: problem formulation (30 minutes)

  • Sparse representation: which one is the best?
  • The optimization framework
Part II: Dictionary learning: optimisation strategies and algorithms (60 minutes)
  • Olshausen & Fields algorithm
  • Maximum optimal directions (MOD) dictionary algorithm
  • Recursive least squares dictionary learning algorithm (RLS-DLA)
  • K-SVD (primitive and improved K-SVD)
  • Large step Gradient Descent (LGD)
  • Simultaneous codeword optimisation (SimCO), including primitive, regularised, and weighted SimCO
  • Greedy adaptive dictionary (GAD) learning algorithms
  • Other dictionary algorithms (such as structure-oriented parametric dictionary learning algorithms)
Part III: Dictionary learning: performance constraints and practical issues (40 minutes)
  • Sparsity and coherence issues in dictionary learning
  • Translation-invariant, shift-invariant, and orthogonality constraints and adapted algorithms
  • Robustness and computational efficiencies of dictionary learning algorithms
Part IV: Dictionary learning: applications, demonstrations, software tools, and resources (50 minutes)
  • Applications and demonstrations for blind source separation (in audio, speech and image mixtures) , and audio-visual source separation
  • Applications and demonstrations of dictionary learning for audio-visual tracking of multiple speakers in a dynamic room environment
  • Applications and demonstrations for image denoising
  • Publicly available software tools & resources

Speaker Biography

Wei Dai

Wei Dai (S'01, M'07) received his B.E. degree in electronic engineering from Tsinghua University in 1999, and his Ph.D. degree in electrical and computer engineering from the University of Colorado at Boulder in 2007. From 2007 to 2010, he was a Postdoctoral Research Associate at the University of Illinois at Urbana-Champaign. Since 2011, he has been a Lecturer (an Assistant Professor) in the Department of Electrical and Electronic Engineering, Imperial College London. His interdisciplinary research interests include sparse signal processing, wireless communications, and applications of information theory and signal processing to biology. He has given a number of invited talks on academic workshops, industrial events, and university seminars.

Wenwu Wang

Wenwu Wang (M'03, SM'11) was born in Anhui, China. He received the B.Sc. degree in automatic control in 1997, the M.E. degree in control science and control engineering in 2000, and the Ph.D. degree in navigation guidance and control in 2002, all from Harbin Engineering University, China.

He joined the Department of Electronic Engineering, King's College London, U.K., as a Postdoctoral Research Associate in May 2002, and then transferred to the Cardiff School of Engineering, Cardiff University, U.K., in January 2004. In May 2005, he joined the Tao Group Ltd. (now Antix Labs Ltd.), U.K., as a DSP engineer. In September 2006, he moved to the Creative Technology Ltd. working at the Sensaura Division, U.K., as a Software R&D Engineer working on working on 3D positional audio technology for embedded systems and mobile devices. Since May 2007, he has been with the Centre for Vision Speech and Signal Processing, University of Surrey, U.K., where he is currently a Lecturer on Signal Processing. During the Spring of 2008, he has also been a visiting scholar in the Perception and Neurodynamics Lab and the Center for Cognitive Science, at the Ohio State University, USA. He is also a member of the Ministry of Defence (MoD) University Defence Research Centre in Signal Processing and the BBC Audio Research Partnership.

He was an Area Chair of the 2012 European Signal Processing Conference, a Track Chair and Publicity Co-Chair of 2009 IEEE Statistical Signal Processing Workshop, Program Co-Chair of the 2009 IEEE Global Congress on Intelligent Systems. He has been a Session Chair for numerous conferences including ICASSP 2012 and EUSIPCO 2012. He is a Fellow of the Higher Education Academy, a Member of the ISCA, a Senior Member of the IEEE, and belongs to the IEEE Signal Processing, Circuits and Systems, and Computational Intelligence Societies. He has given a number of invited talks on academic workshops, industrial events, and university seminars.

His current research interests are in the areas of blind signal processing, machine audition (listening), audio-visual signal processing, sparse signal processing, machine learning, and perception. His research has been funded by the Engineering and Physical Sciences Research Council, Ministry of Defence, Defence Sciences and Technology Laboratory, Home Office, Royal Academy of Engineering, and the University Research Support Fund. He has published approximately 100 journal and conferences papers and five book chapters, and edited a book titled Machine Audition by the IGI Press.

Boris Mailhé

Boris Mailhé (S'09, M'12) was born in Lyon. He received his B.Sc. and M.Sc. in computer science from Rennes 1 University and Ecole Normale Supérieure Cachan in 2004 and 2005 and his Ph.D. in signal processing from Rennes 1 University in 2010. Since 2011 he is a Postdoctoral Research Assistant at Queen Mary, University of London. His research interests include large-scale sparse signal processing and non-convex optimization methods.