Speech-Synthesis-And-Recognition
Speech Synthesis and Recognition
Second Edition
John Holmes and Wendy Holmes
London and New York
First edition by the late Dr J.N.Holmes published 1988 by Van Nostrand Reinhold Second edition published 2001 by Taylor & Francis 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Taylor & Francis 29 West 35thStreet, New York, NY 10001 Taylor & Francis is an imprint of the Taylor & Francis Group This edition published in the Taylor & Francis e-Library, 2003. © 2001 Wendy J.Holmes Publisher’s Note This book has been prepared from camera-ready copy provided by the authors. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or othermeans, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Every effort has been made to ensure that the advice and information in this book is true and accurate at the time of going to press. However, neither the publisher nor the authors can accept any legal responsibility orliability for any errors or omissions that may be made. In the case of drug administration, any medical procedure or the use of technical equipment mentioned within this book, you are strongly advised to consult the manufacturer’s guidelines.
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging inPublication Data Holmes, J.N. Speech synthesis and recognition/John Holmes and Wendy Holmes.—2nd ed p.cm. Includes bibliographical references and index. ISBN 0-7484-0856-8 (hc.)—ISBN 0-7484-0857-6 (pbk.) 1. Speech processing systems. I.Holmes, Wendy (Wendy J.) II. Title. TK77882.S65 H64 2002 006.4’54–dc21 2001044279 ISBN 0-203-48468-1 Master e-book ISBN
ISBN 0-203-79292-0 (Adobe eReader Format)ISBN 0-7484-0856-8 (hbk) ISBN 0-7484-0857-6 (pbk)
CONTENTS
Preface to the First Edition Preface to the Second Edition List of Abbreviations xiii xv xvii
1
Human Speech Communication
1 1 1 1 2 2 3 4 6 6 7 8 9 10 10
1.1 Value of speech for human-machine communication 1.2 Ideas and language 1.3 Relationship between written and spoken language 1.4 Phonetics and phonology 1.5 Theacoustic signal 1.6 Phonemes, phones and allophones 1.7 Vowels, consonants and syllables 1.8 Phonemes and spelling 1.9 Prosodic features 1.10 Language, accent and dialect 1.11 Supplementing the acoustic signal 1.12 The complexity of speech processing Chapter 1 summary Chapter 1 exercises
2 2.1 2.2 2.3 2.4 2.5 2.6 2.7
Mechanisms and Models of Human Speech Production
11 11 12 15 19 21 21 25 2627 31 32
Introduction Sound sources The resonant system Interaction of laryngeal and vocal tract functions Radiation Waveforms and spectrograms Speech production models 2.7.1 Excitation models 2.7.2 Vocal tract models Chapter 2 summary Chapter 2 exercises
3 3.1 3.2 3.3
Mechanisms and Models of the Human Auditory System Introduction Physiology of the outer and middle ears Structure of thecochlea
33 33 33 34
v
vi
Contents
3.4 3.5 3.6 3.7
Neural response Psychophysical measurements Analysis of simple and complex signals Models of the auditory system 3.7.1 Mechanical filtering 3.7.2 Models of neural transduction 3.7.3 Higher-level neural processing Chapter 3 summary Chapter 3 exercises
36 38 41 42 42 43 43 46 46
4 4.1 4.2
Digital Coding of Speech
47 4748 48 50 52 53 53 54 56 57 58 58 59 60 60 62 62 63 64 64 64 65 66 66
Introduction Simple waveform coders 4.2.1 Pulse code modulation 4.2.2 Deltamodulation 4.3 Analysis/synthesis systems (vocoders) 4.3.1 Channel vocoders 4.3.2 Sinusoidal coders 4.3.3 LPC vocoders 4.3.4 Formant vocoders 4.3.5 Efficient parameter coding 4.3.6 Vocoders based on segmental/phonetic structure 4.4 Intermediate...
Regístrate para leer el documento completo.