Last Modified : Sat, 27 Sep 14


This course will bring the student to the forefront of signal processing with many practical results and a fundamental understanding of the basic requirements to develop novel algorithms in speech recognition and processing, where the resulting signals are meant for listening, such as speech coding. Speech processing in three parts:

  1. The theory of acoustics of speech production, introductory acoustic phonetics, including inhomogeneous transmission line theory, reflectance, room acoustics, the short-time Fourier Transform (and its inverse), and signal processing of speech, such as LPC/CELP/VQ.
  2. Psychoacoustics of speech perception, critical bands, masking JNDs, and the physiology of the auditory pathway; cochlear modeling.
  3. Information theory, entropy, channel capacity, the confusion matrix, state models, EM algorithms, and Bayesian networks. Classic papers on speech processing and speech perception assigned and presented by student groups.


The textbook is Speech Analysis Synthesis and Perception (Third Edition) by James L. Flanagan, Jont B. Allen and Mark A. Hasegawa-Johnson; To be published by Academic Press (2009) chapters

I have tried to provide all files in pdf format, however for some older files, you will need the DjVu viewer to read/print. A description of Djvu can be found at djvu.org. Djvu is useful because it compresses scanned bitmaps (e.g., old papers for which there is no text or pdf version) in a highly compressed format. It automatically detects images and photos and treats them differently.

There are many versions of this reader software, all free, and any of them will work. The open-source versions are multiple, including an open source version djview4. Please let either Prof. Allen or TA Reggie Weece know, and we will help you get started.

