语音合成技术和文本语音转换 - Synthetic voice and Text to Speech technology - Синтетический голос и технологии преобразования текста в речь |
| | Синтез речи и обработка звука | |
| |
Автор | Сообщение | Синтез речи и обработка звука |
---|
bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Вт Сен 08 2009, 23:08 | Вт Сен 08 2009, 23:08 | |
| Digital Speech Processing: Synthesis, And Recognition Sadaoki Furui, Furui Furui
Over the past 50 years, digital signal processing has evolved as a major engineering discipline. The fields of signal processing have grown from the origin of fast Fourier transform and digital filter design to statistical spectral analysis and array processing, and image, audio, and multimedia processing, and shaped developments in high-performance VLSI signal processor design. Indeed, there are few fields that enjoy so many applications signal processing is everywhere in our lives.
When one uses a cellular phone, the voice is compressed, coded, and modulated using signal processing techniques. As a cruise missile winds along hillsides searching for the target, (he signal processor is busy processing the images taken along the way. When we are watching a movie in HDTV, millions of audio and video data are being sent to our homes and received with unbelievable fidelity. When scientists compare DNA samples, fast pattern recognition techniques are being used. On and on, one can see the impact of signal processing in almost every engineering and scientific discipline.
Because of the immense importance of signal processing and the fast-growing demands of business and industry, this series on signal processing serves to report up-to-date developments and advances in the field. The topics of interest include but are not limited to the following:
• Signal theory and analysis • Statistical signal processing • Speech and audio processing • Image and video processing • Multimedia signal processing and technology • Signal processing for communications • Signal processing architectures and VLSI design
I hope this series will provide the interested audience with high-quality, state-of-the-art signal processing literature through research monographs, edited books, and rigorously written textbooks by experts in their fields. Contents
- Спойлер:
Series Introduction (K. J. Ray Liu) iii Preface to the Second Edition v Acknowledgments vii Preface to the First Edition xi 1. INTRODUCTION
2. PRINCIPAL CHARACTERISTICS OF SPEECH 5 2.1 Linguistic Information 5 2.2 Speech and Hearing 7 2.3 Speech Production Mechanism 9 2.4 Acoustic Characteristics of Speech 14 2.5 Statistical Characteristics of Speech 20 2.5.1 Distribution of amplitude level 20 2.5.2 Long-time averaged spectrum 23 2.5.3 Variation in fundamental frequency 24 2.5.4 Speech ratio 26
3. SPEECH PRODUCTION MODELS 27 3.1 Acoustical Theory of Speech Production 27 3.2 Linear Separable Equivalent Circuit Model 30 3.3 Vocal Tract Transmission Model 32 3.3.1 Progressing wave model 32 3.3.2 Resonance model 38 3.4 Vocal Cord Model
4. SPEECH ANALYSIS AND ANALYSIS-SYNTHESIS SYSTEMS 45 4.1 Digitization 45 4.1.1 Sampling 46 4.1.2 Quantization and coding 47 4.1.3 A/D and D/A conversion 51 4.2 Spectral Analysis 52 4.2.1 Spectral structure of speech 52 4.2.2 Autocorrelation and Fourier transform 53 4.2.3 Window function 57 4.2.4 Sound spectrogram 60 4.3 Cepstrum 62 4.3.1 Cepstrum and its application 62 4.3.2 Homomorphic analysis and LPC cepstrum 66 4.4 Filter Bank and Zero-Crossing Analysis 70 4.4.1 Digital filter bank 70 4.4.2 Zero-crossing analysis 70 4.5 Analysis-by-Synthesis r 71 4.6 Analysis-Synthesis Systems 73 4.6.1 Analysis-synthesis system structure 73 4.6.2 Examples of analysis-synthesis systems 73 4.7 Pitch Extraction
5.LINEAR PREDICTIVE CODING (LPC) ANALYSIS 83 5.1 Principles of LPC Analysis 83 5.2 LPC Analysis Procedure 86 5.3 Maximum Likelihood Spectral Estimation 89 5.3.1 Formulation of maximum likelihood spectral estimation 89 5.3.2 Physical meaning of maximum likelihood spectral estimation 93 5.4 Source Parameter Estimation from Residual Signals 98 5.5 Speech Analysis-Synthesis System by LPC 99 5.6 PARCOR Analysis 102 5.6.1 Formulation of PARCOR analysis 102 5.6.2 Relationship between PARCOR and LPC coefficients 108 5.6.3 PARCOR synthesis filter 109 5.6.4 Vocal tract area estimation based on PARCOR analysis 110 5.7 Line Spectrum Pair (LSP) Analysis 116 5.7.1 Principle of LSP analysis 116 5.7.2 Solution of LSP analysis 119 5.7.3 LSP synthesis filter 122 5.7.4 Coding of LSP parameters 126 5.7.5 Composite sinusoidal model 126 5.7.6 Mutual relationships between LPC parameters 127 5.8 Pole-Zero Analysis 129
6.SPEECH CODING 6.1 Principal Techniques for Speech Coding 133 6.1.1 Reversible coding 133 6.1.2 Irreversible coding and information rate distortion theory 134 6.1.3 Waveform coding and analysis-synthesis systems 135 6.1.4 Basic techniques for waveform coding methods 138 6.2 Coding in Time Domain 141 6.2.1 Pulse code modulation (PCM) 141 6.2.2 Adaptive quantization 143 6.2.3 Predictive coding 143 6.2.4 Delta modulation 149 6.2.5 Adaptive differential PCM (ADPCM) 151 6.2.6 Adaptive predictive coding (APC) 153 6.2.7 Noise shaping 156 6.3 Coding in Frequency Domain 159 6.3.1 " Subband coding (SBC) 159 6.3.2 Adaptive transform coding (АТС) 163 6.3.3 APC with adaptive bit allocation (APC-AB) 166 6.3.4 Time-domain harmonic scaling (TDHS) algorithm 168 6.4 Vector Quantization 173 6.4.1 Multipath search coding 173 6.4.2 Principles of vector quantization 175 6.4.3 Tree search and multistage processing 178 6.4.4 Vector quantization for linear predictor parameters 180 6.4.5 Matrix quantization and finite-state vector quantization 182 6.5 Hybrid Coding 187 6.5.1 Residual- or speech-excited linear predictive coding 187 6.5.2 Multipulse-excited linear predictive coding (MPC) 189 6.5.3 Code-excited linear predictive coding (CELP) 193 6.5.4 Coding by phase equalization and variable-rate tree coding r 196 6.6 Evaluation and Standardization of Coding Methods 199 6.6.1 Evaluation factors of speech coding systems 199 6.6.2 Speech coding standards 203 6.7 Robust and Flexible Speech Coding 211
7.SPEECH SYNTHESIS 7.1 Principles of Speech Synthesis 213 7.2 Synthesis Based on Waveform Coding 217 7.3 Synthesis Based on Analysis-Synthesis Method 221 7.4 Synthesis Based on Speech Production Mechanism 222 7.4.1 Vocal tract analog method 223 7.4.2 Terminal analog method 224 7.5 Synthesis by Rule 226 7.5.1 Principles of synthesis by rule 226 7.5.2 Control of prosodic features 230 7.6 Text-to-Speech Conversion 234 7.7 Corpus-Based Speech Synthesis 237
8. SPEECH RECOGNITION 8.1 Principles of Speech Recognition 243 8.1.1 Advantages of speech recognition 243 8.1.2 Difficulties in speech recognition 245 8.1.3 Classification of speech recognition 246 8.2 Speech Period Detection 248 8.3 Spectral Distance Measures 249 8.3.1 Distance measures used in speech recognition 249 8.3.2 Distances based on nonparametric spectral analysis 251 8.3.3 Distances based on LPC 252 8.3.4 Peak-weighted distances based on LPC analysis 258 8.3.5 Weighted cepstral distance 260 8.3.6 Transitional cepstral distance 262 8.3.7 Prosody 264 8.4 Structure of Word Recognition Systems 264 8.5 Dynamic Time Warping (DTW) 266 8.5.1 DP matching 266 8.5.2 Variations in DP matching 270 8.5.3 Staggered array DP matching 272 8.6 Word Recognition Using Phoneme Units 275 8.6.1 Principal structure 275 8.6.2 SPLIT method 277 8.7 Theory and Implementation of HMM 278 8.7.1 Fundamentals of HMM 278 8.7.2 Three basic problems for HMMs 282 8.7.3 Solution to Problem 1—probability evaluation 283 8.7.4 Solution to Problem 2—optimal slate sequence 286 8.7.5 Solution to Problem 3- parameter estimation 288 8.7.6 Continuous observation densities in HMMs 290 8.7.7 Tied-mixture HMM 292 8.7.8 MMI and MCE/GPD training of HMM 292 8.7.9 HMM system for word recognition 293 Connected Word Recognition 295 8.8.1 Two-level DP matching and its modifications 295 8.8.2 Word spotting 303 Large-Vocabulary Continuous-Speech Recognition 306 8.9.1 Three principal structural models 306 8.9.2 Other system constructing factors 308 8.9.3 Statistical theory of continuous-speech recognition 311 8.9.4 Statistical language modeling 312 8.9.5 Typical structure of large-vocabulary continuous-speech recognition 314 systems 318 8.9.6 Methods for evaluating recognition systems 320 Examples of Large-Vocabulary Continuous-Speech Recognition Systems 323 8.10.1 DARPA speech recognition projects 323 8.10.2 English speech recognition system at LIMS1 Laboratory 324 8.10.3 English speech recognition system at IBM Laboratory 325 8.10.4 A Japanese speech recognition system 328 8.11 Speaker-Independent and Adaptive Recognition 330 8.11.1 Multi-template method 332 8.11.2 Statistical method 333 8.11.3 Speaker normalization method 334 8.11.4 Speaker adaptation methods 335 8.11.5 Unsupervised speaker adaptation method 8.12 Robust Algorithms Against Noise and Channel Variations 8.12.1 HMM composition/PMC 8.12.2 Detection-based approach for spontaneous speech recognition
9.SPEAKER RECOGNITION 9.1 Principles of Speaker Recognition 9.1.1 Human and computer speaker recognition 9.1.2 Individual characteristics 9.2 Speaker Recognition Methods 9.2.1 Classification of speaker recognition methods 9.2.2 Structure of speaker recognition systems 9.2.3 Relationship between error rate and number of speakers 9.2.4 Intra-speaker variation and evaluation of feature parameters 9.2.5 Likelihood (distance) normalization 9.3 Examples of Speaker Recognition Systems 9.3.1 Text-dependent speaker recognition systems 9.3.2 Text-independent speaker recognition systems 9.3.3 Text-prompted speaker recognition systems
10 FUTURE DIRECTIONS OF SPEECH INFORMATION PROCESSING 10.1 Overview 375 10.2 Analysis and Description of Dynamic Features 378 10.3 Extraction and Normalization of Voice Individuality 379 10.4 Adaptation to Environmental Variation 380 10.5 Basic Units for Speech Processing 381 10.6 Advanced Knowledge Processing 382 10.7 Clarification of Speech Production Mechanism 383 10.8 Clarification of Speech Perception Mechanism 384 10.9 Evaluation Methods for Speech Processing Technologies 385 10.10 LSI for Speech Processing Use 386
APPENDICES A Convolution and z-Transform 387 A.l Convolution 387 A.2 z-Transform 388 A.3 Stability 391 В Vector Quantization Algorithm B.l VQ (Vector Quantization) Technique 393 Formulation 393 B.2 Lloyd's Algorithm (X-Mcans Algorithm) 394 B.3 LBG Algorithm 395 С Neural Nets 399 Bibliography 405 Index 437
- Спойлер:
Digital Speech Processing: Synthesis, And Recognition, Second Edition
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Вт Сен 08 2009, 23:17 | Вт Сен 08 2009, 23:17 | |
| Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition. Daniel Jurafsky & James H. Martin.
Table of Contents
- Спойлер:
Preface 1 Introduction
I: Words 2 Regular Expressions and Automata 3 Words and Transducers 4 N-grams 5 Part-of-Speech Tagging 6 Hidden Markov and Maximum Entropy Models
II: Speech 7 Phonetics 8 Speech Synthesis 9 Automatic Speech Recognition 10 Speech Recognition: Advanced Topics 11 Computational Phonology
III: Syntax 12 Formal Grammars of English 13 Syntactic Parsing 14 Statistical Parsing 15 Features and Unification 16 Language and Complexity
IV: Semantics and Pragmatics 17 The Representation of Meaning 18 Computational Semantics 19 Lexical Semantics 20 Computational Lexical Semantics 21 Computational Discourse
V: Applications 22 Information Extraction 23 Question Answering and Summarization 24 Dialog and Conversational Agents 25 Machine Translation
- Спойлер:
Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition. 2nd edition
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Вт Сен 08 2009, 23:29 | Вт Сен 08 2009, 23:29 | |
| Foundations of Statistical Natural Language Processing Christopher D. Manning and Hinrich Schütze
Table of Contents
- Спойлер:
List of Tables List of Figures Table of Notations Preface Road Map I Preliminaries 1 Introduction 1.1 Rationalist and Empiricist Approaches to Language 1.2 Scientific Content 1.3 The Ambiguity of Language: Why NLP Is Difficult 1.4 Dirty Hands 1.5 Further Reading 1.6 Exercises
2 Mathematical Foundations 2.1 Elementary Probability Theory 2.2 Essential Information Theory 2.3 Further Reading
3 Linguistics Essentials 3.1 Parts of Speech and Morphology 3.2 Phrase Structure 3.3 Semantics and Pragmatics 3.4 Other Areas 3.5 Further Reading 3.6 Exercises
4 Corpus-Based Work 4.1 Getting Set Up 4.2 Looking at Text 4.3 Marked-Up Data 4.4 Further Reading 4.5 Exercises
II Words 5 Collocations 5.1 Frequency 5.2 Mean and Variance 5.3 Hypothesis Testing 5.4 Mutual Information 5.5 The Notion of Collocation 5.6 Further Reading
6 Statistical Inference: n-gram Models over Sparse Data 6.1 Bins: Forming Equivalence Classes 6.2 Statistical Estimators 6.3 Combining Estimators 6.4 Conclusions 6.5 Further Reading 6.6 Exercises
7 Word Sense Disambiguation 7.1 Methodological Preliminaries 7.2 Supervised Disambiguation 7.3 Dictionary-Based Disambiguation 7.4 Unsupervised Disambiguation 7.5 What Is a Word Sense? 7.6 Further Reading 7.7 Exercises
8 Lexical Acquisition 8.1 Evaluation Measures 8.2 Verb Subcategorization 8.3 Attachment Ambiguity 8.4 Selectional Preferences 8.5 Semantic Similarity 8.6 The Role of Lexical Acquisition in Statistical NLP 8.7 Further Reading
III Grammar 9 Markov Models 9.1 Markov Models 9.2 Hidden Markov Models 9.3 The Three Fundamental Questions for HMMs 9.4 HMMs: Implementation, Properties, and Variants 9.5 Further Reading
10 Part-of-Speech Tagging 10.1 The Information Sources in Tagging 10.2 Markov Model Taggers 10.3 Hidden Markov Model Taggers 10.4 Transformation-Based Learning of Tags 10.5 Other Methods, Other Languages 10.6 Tagging Accuracy and Uses of Taggers 10.7 Further Reading 10.8 Exercises
11 Probabilistic Context Free Grammars 11.1 Some Features of PCFGs 11.2 Questions for PCFGs 11.3 The Probability of a String 11.4 Problems with the Inside-Outside Algorithm 11.5 Further Reading 11.6 Exercises
12 Probabilistic Parsing 12.1 Some Concepts 12.2 Some Approaches 12.3 Further Reading 12.4 Exercises
IV Applications and Techniques 13 Statistical Alignment and Machine Translation 13.1 Text Alignment 13.2 Word Alignment 13.3 Statistical Machine Translation 13.4 Further Reading
14 Clustering 14.1 Hierarchical Clustering 14.2 Non-Hierarchical Clustering 14.3 Further Reading 14.4 Exercises
15 Topics in Information Retrieval 15.1 Some Background on Information Retrieval 15.2 The Vector Space Models 15.3 Term Distribution Models 15.4 Latent Semantic Indexing 15.5 Discourse Segmentation 15.6 Further Reading 15.7 Exercises
16 Text Categorization 16.1 Decision Trees 16.2 Maximum Entropy Modeling 16.3 Perceptions 16.4 k Nearest Neighbor Classification 16.5 Further Reading
Tiny Statistical Tables Bibliography Index
- Спойлер:
Foundations of Statistical Natural Language Processing
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| | | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 10 2009, 21:35 | Чт Сен 10 2009, 21:35 | |
| Speech Synthesis and Recognition Wendy Holmes
Description:
With the growing impact of information technology on daily life, speech is becoming increasingly important for providing a natural means of communication between humans and machines. This extensively reworked and updated new edition of Speech Synthesis and Recognition is an easy-to-read introduction to current speech technology. Aimed at advanced undergraduates and graduates in electronic engineering, computer science and information technology, the book is also relevant to professional engineers who need to understand enough about speech technology to be able to apply it successfully and to work effectively with speech experts. No advanced mathematical ability is required and no specialist prior knowledge of phonetics or of the properties of speech signals is assumed. Speech Synthesis and Recognition: · Explains the complexity of speech communication · Describes mechanisms and models of human speech production and perception · Covers concatentive synthesis techniques and format synthesis by rule, as well as the processing required for synthesis from text · Introduces methods for automatic speech recognition by whole-word template matching and by statistical pattern matching using hidden Markov methods · Describes practical techniques that contribute to the successful implementation of speech recognition systems, including those for recognizing very large vocabularies · Includes chapters covering the related technologies of digital speech coding and automatic recognition of speaker characteristics · Discusses applications and performance of current speech technology Throughout the book the emphasis is on explaining underlying principles with sufficient but not unnecessary detail, so as to provide the reader with a thorough grounding in the problems and techniques in speech synthesis and recognition. This book is therefore ideal as an introduction before tackling more advanced texts. CONTENTS
- Спойлер:
Preface to the First Edition xiii Preface to the Second Edition xv List of Abbreviations xvii
1 Human Speech Communication 1 1.1 Value of speech for human-machine communication 1 1.2 Ideas and language 1 1.3 Relationship between written and spoken language 1 1.4 Phonetics and phonology 2 1.5 The acoustic signal 2 1.6 Phonemes, phones and allophones 3 1.7 Vowels, consonants and syllables 4 1.8 Phonemes and spelling 6 1.9 Prosodic features 6 1.10 Language, accent and dialect 7 1.11 Supplementing the acoustic signal 8 1.12 The complexity of speech processing 9 Chapter 1 summary 10 Chapter 1 exercises 10
2 Mechanisms and Models of Human Speech Production 11 2.1 Introduction 11 2.2 Sound sources 12 2.3 The resonant system 15 2.4 Interaction of laryngeal and vocal tract functions 19 2.5 Radiation 21 2.6 Waveforms and spectrograms 21 2.7 Speech production models 25 2.7.1 Excitation models 26 2.7.2 Vocal tract models 27 Chapter 2 summary 31 Chapter 2 exercises 32
3 Mechanisms and Models of the Human Auditory System 33 3.1 Introduction 33 3.2 Physiology of the outer and middle ears 33 3.3 Structure of the cochlea 34vi Contents 3.4 Neural response 36 3.5 Psychophysical measurements 38 3.6 Analysis of simple and complex signals 41 3.7 Models of the auditory system 42 3.7.1 Mechanical filtering 42 3.7.2 Models of neural transduction 43 3.7.3 Higher-level neural processing 43 Chapter 3 summary 46 Chapter 3 exercises 46
4 Digital Coding of Speech 47 4.1 Introduction 47 4.2 Simple waveform coders 48 4.2.1 Pulse code modulation 48 4.2.2 Deltamodulation 50 4.3 Analysis/synthesis systems (vocoders) 52 4.3.1 Channel vocoders 53 4.3.2 Sinusoidal coders 53 4.3.3 LPC vocoders 54 4.3.4 Formant vocoders 56 4.3.5 Efficient parameter coding 57 4.3.6 Vocoders based on segmental/phonetic structure 58 4.4 Intermediate systems 58 4.4.1 Sub-band coding 59 4.4.2 Linear prediction with simple coding of the residual 60 4.4.3 Adaptive predictive coding 60 4.4.4 Multipulse LPC 62 4.4.5 Code-excited linear prediction 62 4.5 Evaluating speech coding algorithms 63 4.5.1 Subjective speech intelligibility measures 64 4.5.2 Subjective speech quality measures 64 4.5.3 Objective speech quality measures 64 4.6 Choosing a coder 65 Chapter 4 summary 66 Chapter 4 exercises 66
5 Message Synthesis from Stored Human Speech Components 67 5.1 Introduction 67 5.2 Concatenation of whole words 67 5.2.1 Simple waveform concatenation 67 5.2.2 Concatenation of vocoded words 70 5.2.3 Limitations of concatenating word-size units 71 5.3 Concatenation of sub-word units: general principles 71 5.3.1 Choice of sub-word unit 71 5.3.2 Recording and selecting data for the units 72 5.3.3 Varying durations of concatenative units 73 5.4 Synthesis by concatenating vocoded sub-word units 74 5.5 Synthesis by concatenating waveform segments 74 5.5.1 Pitch modification 75 5.5.2 Timing modification 77 5.5.3 Performance of waveform concatenation 77 5.6 Variants of concatenative waveform synthesis 78 5.7 Hardware requirements 79 Chapter 5 summary 80 Chapter 5 exercises 80
6 Phonetic synthesis by rule 81 6.1 Introduction 81 6.2 Acoustic-phonetic rules 81 6.3 Rules for formant synthesizers 82 6.4 Table-driven phonetic rules 83 6.4.1 Simple transition calculation 84 6.4.2 Overlapping transitions 85 6.4.3 Using the tables to generate utterances 86 6.5 Optimizing phonetic rules 89 6.5.1 Automatic adjustment of phonetic rules 89 6.5.2 Rules for different speaker types 90 6.5.3 Incorporating intensity rules 91 6.6 Current capabilities of phonetic synthesis by rule 91 Chapter 6 summary 92 Chapter 6 exercises 92
7 Speech Synthesis from Textual or Conceptual Input 93 7.1 Introduction 93 7.2 Emulating the human speaking process 93 7.3 Converting from text to speech 94 7.3.1 TTS system architecture 94 7.3.2 Overview of tasks required for TTS conversion 96 7.4 Text analysis 97 7.4.1 Text pre-processing 97 7.4.2 Morphological analysis 99 7.4.3 Phonetic transcription 100 7.4.4 Syntactic analysis and prosodic phrasing 101 7.4.5 Assignment of lexical stress and pattern of word accents 102 7.5 Prosody generation 102 7.5.1 Timing pattern 103 7.5.2 Fundamental frequency contour 104 7.6 Implementation issues 106 7.7 Current TTS synthesis capabilities 107 7.8 Speech synthesis from concept 107 Chapter 7 summary 108 Chapter 7 exercises 108
8 Introduction to automatic speech recognition: template matching 109 8.1 Introduction 109 8.2 General principles of pattern matching 109 8.3 Distance metrics 110 8.3.1 Filter-bank analysis 111 8.3.2 Level normalization 112 8.4 End-point detection for isolated words 114 8.5 Allowing for timescale variations 115 8.6 Dynamic programming for time alignment 115 8.7 Refinements to isolated-word DP matching 117 8.8 Score pruning 118 8.9 Allowing for end-point errors 121 8.10 Dynamic programming for connected words 121 8.11 Continuous speech recognition 124 8.12 Syntactic constraints 125 8.13 Training a whole-word recognizer 125 Chapter 8 summary 126 Chapter 8 exercises 126
9 Introduction to stochastic modelling 127 9.1 Feature variability in pattern matching 127 9.2 Introduction to hidden Markov models 128 9.3 Probability calculations in hidden Markov models 130 9.4 The Viterbi algorithm 133 9.5 Parameter estimation for hidden Markov models 134 9.5.1 Forward and backward probabilities 135 9.5.2 Parameter re-estimation with forward and backward probabilities 136 9.5.3 Viterbi training 139 9.6 Vector quantization 140 9.7 Multi-variate continuous distributions 141 9.8 Use of normal distributions with HMMs 142 9.8.1 Probability calculations 143 9.8.2 Estimating the parameters of a normal distribution 144Contents ix 9.8.3 Baum-Welch re-estimation 144 9.8.4 Viterbi training 145 9.9 Model initialization 146 9.10 Gaussian mixtures 147 9.10.1 Calculating emission probabilities 147 9.10.2 Baum-Welch re-estimation 148 9.10.3 Re-estimation using the most likely state sequence 149 9.10.4 Initialization of Gaussian mixture distributions 150 9.10.5 Tied mixture distributions 151 9.11 Extension of stochastic models to word sequences 152 9.12 Implementing probability calculations 153 9.12.1 Using the Viterbi algorithm with probabilities in logarithmic form 153 9.12.2 Adding probabilities when they are in logarithmic form 154 9.13 Relationship between DTW and a simple HMM 155 9.14 State durational characteristics of HMMs 156 Chapter 9 summary 157 Chapter 9 exercises 158
10 Introduction to front-end analysis for automatic speech recognition 159 10.1 Introduction 159 10.2 Pre-emphasis 159 10.3 Frames and windowing 159 10.4 Filter banks, Fourier analysis and the mel scale 160 10.5 Cepstral analysis 161 10.6 Analysis based on linear prediction 165 10.7 Dynamic features 166 10.8 Capturing the perceptually relevant information 167 10.9 General feature transformations 167 10.10 Variable-frame-rate analysis 167 Chapter 10 summary 168 Chapter 10 exercises 168
11 Practical techniques for improving speech recognition performance 169 11.1 Introduction 169 11.2 Robustness to environment and channel effects 169 11.2.1 Feature-based techniques 171 11.2.2 Model-based techniques 171 11.2.3 Dealing with unknown or unpredictable noise corruption 173 11.3 Speaker-independent recognition 174 11.3.1 Speaker normalization 175 11.4 Model adaptation 176x Contents 11.4.1 Bayesian methods for training and adaptation of HMMs 176 11.4.2 Adaptation methods based on linear transforms 178 11.5 Discriminative training methods 179 11.5.1 Maximum mutual information training 179 11.5.2 Training criteria based on reducing recognition errors 180 11.6 Robustness of recognizers to vocabulary variation 181 Chapter 11 summary 181 Chapter 11 exercises 182
12 Automatic speech recognition for large vocabularies 183 12.1 Introduction 183 12.2 Historical perspective 183 12.3 Speech transcription and speech understanding 184 12.4 Speech transcription 185 12.5 Challenges posed by large vocabularies 186 12.6 Acoustic modelling 187 12.6.1 Context-dependent phone modelling 188 12.6.2 Training issues for context-dependent models 188 12.6.3 Parameter tying 190 12.6.4 Training procedure 190 12.6.5 Methods for clustering model parameters 193 12.6.6 Constructing phonetic decision trees 194 12.6.7 Extensions beyond triphone modelling 195 12.7 Language modelling 196 12.7.1 N-grams 197 12.7.2 Perplexity and evaluating language models 197 12.7.3 Data sparsity in language modelling 198 12.7.4 Discounting 199 12.7.5 Backing off in language modelling 200 12.7.6 Interpolation of language models 200 12.7.7 Choice of more general distribution for smoothing 201 12.7.8 Improving on simple N-grams 202 12.8 Decoding 203 12.8.1 Efficient one-pass Viterbi decoding for large vocabularies 203 12.8.2 Multiple-pass Viterbi decoding 204 12.8.3 Depth-first decoding 205 12.9 Evaluating LVCSR performance 205 12.9.1 Measuring errors 205 12.9.2 Controlling word insertion errors 206 12.9.3 Performance evaluations 206 12.10 Speech understanding 209 12.10.1 Measuring and evaluating speech understanding performance 210 Chapter 12 summary 211 Chapter 12 exercises 212
13 Neural networks for speech recognition 213 13.1 Introduction 213 13.2 The human brain 213 13.3 Connectionist models 214 13.4 Properties of ANNs 215 13.5 ANNs for speech recognition 216 13.5.1 Hybrid HMM/ANN methods 217 Chapter 13 summary 218 Chapter 13 exercises 218
14 Recognition of speaker characteristics 219 14.1 Characteristics of speakers 219 14.2 Verification versus identification 219 14.2.1 Assessing performance 220 14.2.2 Measures of verification performance 221 14.3 Speaker recognition 224 14.3.1 Text dependence 224 14.3.2 Methods for text-dependent/text-prompted speaker recognition 224 14.3.3 Methods for text-independent speaker recognition 225 14.3.4 Acoustic features for speaker recognition 226 14.3.5 Evaluations of speaker recognition performance 227 14.4 Language recognition 228 14.4.1 Techniques for language recognition 228 14.4.2 Acoustic features for language recognition 229 Chapter 14 summary 230 Chapter 14 exercises 230
15 Applications and performance of current technology 231 15.1 Introduction 231 15.2 Why use speech technology? 231 15.3 Speech synthesis technology 232 15.4 Examples of speech synthesis applications 233 15.4.1 Aids for the disabled 233 15.4.2 Spoken warning signals, instructions and user feedback 233 15.4.3 Education, toys and games 234 15.4.4 Telecommunications 234 15.5 Speech recognition technology 235 15.5.1 Characterizing speech recognizers and recognition tasks 235 15.5.2 Typical recognition performance for different tasks 237 15.5.3 Achieving success with ASR in an application 238 15.6 Examples of ASR applications 239xii Contents 15.6.1 Command and control 239 15.6.2 Education, toys and games 239 15.6.3 Dictation 240 15.6.4 Data entry and retrieval 240 15.6.5 Telecommunications 241 15.7 Applications of speaker and language recognition 243 15.8 The future of speech technology applications 243 Chapter 15 summary 244 Chapter 15 exercises 244
16 Future research directions in speech synthesis and recognition 245 16.1 Introduction 245 16.2 Speech synthesis 245 16.2.1 Speech sound generation 246 16.2.2 Prosody generation and higher-level linguistic processing 247 16.3 Automatic speech recognition 248 16.3.1 Advantages of statistical pattern-matching methods 248 16.3.2 Limitations of HMMs for speech recognition 249 16.3.3 Developing improved recognition models 250 16.4 Relationship between synthesis and recognition 252 16.5 Automatic speech understanding 253 Chapter 16 summary 254 Chapter 16 exercises 254
17 Further Reading 255 17.1 Books 255 17.2 Journals 256 17.3 Conferences and workshops 256 17.4 The Internet 257 17.5 Reading for individual chapters 258
References 265 Solutions to Exercises 277 Glossary 283 Index 287
- Спойлер:
Speech Synthesis and Recognition
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 10 2009, 21:46 | Чт Сен 10 2009, 21:46 | |
| Developments in Speech Synthesis Mark Tatham, Katherine Morton
Description: With a growing need for understanding the process involved in producing and perceiving spoken language, this timely publication answers these questions in an accessible reference. Containing material resulting from many years’ teaching and research, Speech Synthesis provides a complete account of the theory of speech. By bringing together the common goals and methods of speech synthesis into a single resource, the book will lead the way towards a comprehensive view of the process involved in human speech. The book includes applications in speech technology and speech synthesis.
It is ideal for intermediate students of linguistics and phonetics who wish to proceed further, as well as researchers and engineers in telecommunications working in speech technology and speech synthesis who need a comprehensive overview of the field and who wish to gain an understanding of the objectives and achievements of the study of speech production and perception.
Contents
- Спойлер:
Acknowledgements xiii Introduction 1 How Good is Synthetic Speech? 1 Improvements Beyond Intelligibility 1 Continuous Adaptation 2 Data Structure Characterisation 3 Shared Input Properties 4 Intelligibility: Some Beliefs and Some Myths 5 Naturalness 7 Variability 8 The Introduction of Style 10 Expressive Content 11 Final Introductory Remarks 13
Part I Current Work 15 1 High-Level and Low-Level Synthesis 17 1.1 Differentiating Between Low-Level and High-Level Synthesis 17 1.2 Two Types of Text 17 1.3 The Context of High-Level Synthesis 18 1.4 Textual Rendering 20 2 Low-Level Synthesisers: Current Status 23 2.1 The Range of Low-Level Synthesisers Available 23 2.1.1 Articulatory Synthesis 23 2.1.2 Formant Synthesis 24 2.1.3 Concatenative Synthesis 28 Units for Concatenative Synthesis 28 Pepresentation of Speech in the Database 31 Unit Selection Systems: the Data-Driven Approach 32 Unit Joining 33 Cost Evaluation in Unit Selection Systems 35 Prosody and Concatenative Systems 35 Prosody Implementation in Unit Concatenation Systems 36 2.1.4 Hybrid System Approaches to Speech Synthesis 37vi Developments in Speech Synthesis 3 Text-To-Speech 39 3.1 Methods 39 3.2 The Syntactic Parse 39 4 Different Low-Level Synthesisers: What Can Be Expected? 43 4.1 The Competing Types 43 4.2 The Theoretical Limits 45 4.3 Upcoming Approaches 45 5 Low-Level Synthesis Potential 47 5.1 The Input to Low-Level Synthesis 47 5.2 Text Marking 48 5.2.1 Unmarked Text 48 5.2.2 Marked Text: the Basics 48 5.2.3 Waveforms and Segment Boundaries 50 5.2.4 Marking Boundaries on Waveforms: the Alignment Problem 51 5.2.5 Labelling the Database: Segments 54 5.2.6 Labelling the Database: Endpointing and Alignment 55
Part II A New Direction for Speech Synthesis 57 6 A View of Naturalness 59 6.1 The Naturalness Concept 59 6.2 Switchable Databases for Concatenative Synthesis 60 6.3 Prosodic Modifications 61 7 Physical Parameters and Abstract Information Channels 63 7.1 Limitations in the Theory and Scope of Speech Synthesis 63 7.1.1 Distinguishing Between Physical and Cognitive Processes 64 7.1.2 Relationship Between Physical and Cognitive Objects 65 7.1.3 Implications 65 7.2 Intonation Contours from the Original Database 65 7.3 Boundaries in Intonation 67 8 Variability and System Integrity 69 8.1 Accent Variation 69 8.2 Voicing 72 8.3 The Festival System 74 8.4 Syllable Duration 75 8.5 Changes of Approach in Speech Synthesis 76 9 Automatic Speech Recognition 79 9.1 Advantages of the Statistical Approach 80 9.2 Disadvantages of the Statistical Approach 81 9.3 Unit Selection Synthesis Compared with Automatic Speech Recognition 81
Part III High-Level Control 83 10 The Need for High-Level Control 85 10.1 What is High-Level Control? 85Contents vii 10.2 Generalisation in Linguistics 86 10.3 Units in the Signal 89 10.4 Achievements of a Separate High-Level Control 90 10.5 Advantages of Identifying High-Level Control 90 11 The Input to High-Level Control 93 11.1 Segmental Linguistic Input 93 11.2 The Underlying Linguistics Model 94 11.3 Prosody 96 11.4 Expression 98 12 Problems for Automatic Text Markup 99 12.1 The Markup and the Data 100 12.2 Generality on the Static Plane 101 12.3 Variability in the Database–or Not 102 12.4 Multiple Databases and Perception 105 12.5 Selecting Within a Marked Database 105
Part IV Areas for Improvement 109 13 Filling Gaps 111 13.1 General Prosody 111 13.2 Prosody: Expression 112 13.3 The Segmental Level: Accents and Register 113 13.4 Improvements to be Expected from Filling the Gaps 115 14 Using Different Units 119 14.1 Trade-Offs Between Units 119 14.2 Linguistically Motivated Units 119 14.3 A-Linguistic Units 121 14.4 Concatenation 123 14.5 Improved Naturalness Using Large Units 123 15 Waveform Concatenation Systems: Naturalness and Large Databases 127 15.1 The Beginnings of Useful Automated Markup Systems 129 15.2 How Much Detail in the Markup? 129 15.3 Prosodic Markup and Segmental Consequences 132 15.3.1 Method 1: Prosody Normalisation 132 15.3.2 Method 2: Prosody Extraction 133 15.4 Summary of Database Markup and Content 135 16 Unit Selection Systems 137 16.1 The Supporting Theory for Synthesis 137 16.2 Terms 138 16.3 The Database Paradigm and the Limits of Synthesis 139 16.4 Variability in the Database 139 16.5 Types of Database 140 16.6 Database Size and Searchability at Low-Level 142 16.6.1 Database Size 142 16.6.2 Database Searchability 144viii Developments in Speech Synthesis
Part V Markup 145 17 VoiceXML 147 17.1 Introduction 147 17.2 VoiceXML and XML 148 17.3 VoiceXML: Functionality 148 17.4 Principal VoiceXML Elements 149 17.5 Tapping the Autonomy of the Attached Synthesis System 151 18 Speech Synthesis Markup Language (SSML) 153 18.1 Introduction 153 18.2 Original W3C Design Criteria for SSML 153 Consistency 153 Interoperability 154 Generality 154 Internationalisation 154 Generation and Readability 155 Implementability 155 18.3 Extensibility 155 18.4 Processing the SSML Document 155 18.4.1 XML Parse 156 18.4.2 Structure Analysis 156 18.4.3 Text Normalisation 157 18.4.4 Text-To-Phoneme Conversion 157 18.4.5 Prosody Analysis 159 18.4.6 Waveform Production 160 18.5 Main SSML Elements and Their Attributes 160 18.5.1 Document Structure, Text Processing and Pronunciation 160 18.5.2 Prosody and Style 161 18.5.3 Other Elements 162 18.5.4 Comment 162 19 SABLE 165 20 The Need for Prosodic Markup 167 20.1 What is Prosody? 167 20.2 Incorporating Prosodic Markup 167 20.3 How Markup Works 168 20.4 Distinguishing Layout from Content 168 20.5 Uses of Markup 169 20.6 Basic Control of Prosody 170 20.7 Intrinsic and Extrinsic Structure and Salience 172 20.8 Automatic Markup to Enhance Orthography: Interoperability with the Synthesiser 174 20.9 Hierarchical Application of Markup 175 20.10 Markup and Perception 176 20.11 Markup: the Way Ahead? 177 20.12 Mark What and How? 179 20.12.1 Automatic Annotation of Databases for Limited Domain Systems 180 20.12.2 Database Markup with the Minimum of Phonology 180 20.13 Abstract Versus Physical Prosody 182Contents ix
Part VI Strengthening the High-Level Model 183 21 Speech 185 21.1 Introductory Note 185 21.2 Speech Production 186 21.3 Relevance to Acoustics 186 21.4 Summary 187 21.5 Information for Synthesis: Limitations 187 22 Basic Concepts 189 22.1 How does Speaking Occur? 189 22.2 Underlying Basic Disciplines: Contributions from Linguistics 191 22.2.1 Linguistic Information and Speech 191 22.2.2 Specialist Use of the Terms ‘Phonology’ and ‘Phonetics’ 192 22.2.3 Rendering the Plan 193 22.2.4 Types of Model Underlying Speech Synthesis 194 The Static Model 194 The Dynamic Model 194 23 Underlying Basic Disciplines: Expression Studies 197 23.1 Biology and Cognitive Psychology 197 23.2 Modelling Biological and Cognitive Events 198 23.3 Basic Assumptions in Our Proposed Approach 198 23.4 Biological Events 198 23.5 Cognitive Events 201 23.6 Indexing Expression in XML 203 23.7 Summary 204 24 Labelling Expressive/Emotive Content 207 24.1 Data Collection 208 24.2 Sources of Variability 209 24.3 Summary 210 25 The Proposed Model 213 25.1 Organisation of the Model 213 25.2 The Two Stages of the Model 214 25.3 Conditions and Restrictions on XML 214 25.4 Summary 215 26 Types of Model 217 26.1 Category Models 217 26.2 Process Models 218
Part VII Expanded Static and Dynamic Modelling 219 27 The Underlying Linguistics System 221 27.1 Dynamic Planes 221 27.2 Computational Dynamic Phonology for Synthesis 222 27.3 Computational Dynamic Phonetics for Synthesis 223 27.4 Adding How, What and Notions of Time 224x Developments in Speech Synthesis 27.5 Static Planes 224 27.6 Computational Static Phonology for Synthesis 225 27.7 The Term Process in Linguistics 226 27.8 Computational Static Phonetics for Synthesis 228 27.9 Supervision 230 27.10 Time Constraints 230 27.11 Summary of the Phonological and Phonetic Models 231 28 Planes for Synthesis 233
Part VIII The Prosodic Framework, Coding and Intonation 235 29 The Phonological Prosodic Framework 237 29.1 Characterising the Phonological and Phonetic Planes 239 30 Sample Code 245 31 XML Coding 249 31.1 Adding Detail 250 31.2 Timing and Fundamental Frequency Control on the Dynamic Plane 256 31.3 The Underlying Markup 257 31.3.1 Syllables and Stress 258 31.3.2 Durations 260 31.4 Intrinsic Durations 261 31.5 Rendering Intonation as a Fundamental Frequency Contour 262 1: Assign Basic f 0 Values to All S and F Syllables in the Sentence: the Assigned Value is for the Entire Syllable 263 2: Assign f 0 for all U Syllables; Adjust Basic Values 263 3: Remove Monotony 264 4: For Sentences with RESET, where a RESET Point is a Clause or Phrase Boundary 264 32 Prosody: General 265 32.1 The Analysis of Prosody 266 32.2 The Principles of Some Current Models of Intonation Used in Synthesis 268 32.2.1 The Hirst and Di Cristo Model (Including INTSINT) 268 32.2.2 Taylor’s Tilt Model 269 32.2.3 The ToBI (Tones and Break Indices) Model 269 32.2.4 The Basis of Intonation Modelling 270 32.2.5 Details of the ToBI Model 271 32.2.6 The INTSINT (International Transcription System for Intonation) Model 273 32.2.7 The Tatham and Morton Intonation Model 274 Units in T&M Intonation 274 33 Phonological and Phonetic Models of Intonation 277 33.1 Phonological Models 277 33.2 Phonetic Models 277 33.3 Naturalness 278 33.4 Intonation Modelling: Levels of Representation 281Contents xi
Part IX Approaches to Natural-Sounding Synthesis 283 34 The General Approach 285 34.1 Parameterisation 285 34.2 Proposal for a Model to Support Synthesis 286 34.3 Segments and Prosodics: Hierarchical Ordering 287 34.4 A Sample Wrapping in XML 288 34.5 A Prosodic Wrapper for XML 289 34.6 The Phonological Prosodic Framework 290 35 The Expression Wrapper in XML 291 35.1 Expression Wrapping the Entire Utterance 292 35.2 Sourcing for Synthesis 293 35.3 Attributes Versus Elements 294 35.4 Variation of Attribute Sources 296 35.5 Sample Cognitive and Biological Components 297 35.5.1 Parameters of Expression 298 35.5.2 Blends 298 35.5.3 Identifying and Characterising Differences in Expression 298 35.5.4 A Grammar of Expressions 299 36 Advantages of XML in Wrapping 301 36.1 Constraints Imposed by the XML Descriptive System 303 36.2 Variability 303 37 Considerations in Characterising Expression/Emotion 305 37.1 Suggested Characterisation of Features of Expressive/Emotive Content 305 37.1.1 Categories 305 37.1.2 Choices in Dialogue Design 307 37.2 Extent of Underlying Expressive Modelling 308 37.3 Pragmatics 309 38 Summary 313 38.1 Speaking 313 38.2 Mutability 315
Part X Concluding Overview 317 Shared Characteristics Between Database and Output: the Integrity of the Synthesized Utterance 319 Concept-To-Speech 321 Text-To-Speech Synthesis: the Basic Overall Concept 322 Prosody in Text-To-Speech Systems 323 Optimising the Acoustic Signal for Perception 325 Conclusion 326
References 329 Author Index 335 Index 337
- Спойлер:
Developments in Speech Synthesis
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 10 2009, 21:58 | Чт Сен 10 2009, 21:58 | |
| Text-to-Speech Synthesis Paul Taylor
Description:
Text-to-Speech Synthesis provides a complete, end-to-end account of the process of generating speech by computer. Giving an in-depth explanation of all aspects of current speech synthesis technology, it assumes no specialized prior knowledge. Introductory chapters on linguistics, phonetics, signal processing and speech signals lay the foundation, with subsequent material explaining how this knowledge is put to use in building practical systems that generate speech. Including coverage of the very latest techniques such as unit selection, hidden Markov model synthesis, and statistical text analysis, explanations of the more traditional techniques such as format synthesis and synthesis by rule are also provided. Weaving together the various strands of this multidisciplinary field, the book is designed for graduate students in electrical engineering, computer science, and linguistics. It is also an ideal reference for practitioners in the fields of human communication interaction and telephony. Contents
- Спойлер:
1 Introduction 1 1.1 What are text-to-speech systems for? 2 1.2 What should the goals of text-to-speech system development be? 3 1.3 The Engineering Approach 4 1.4 Overview of the book 5 1.4.1 Viewpoints within the book 5 1.4.2 Readers’ backgrounds 6 1.4.3 Background and specialist sections 7
2 Communication and Language 8 2.1 Types of communication 8 2.1.1 Affective communication 8 2.1.2 Iconic communication 9 2.1.3 Symbolic communication 10 2.1.4 Combinations of symbols 11 2.1.5 Meaning, form and signal 12 2.2 Human Communication 13 2.2.1 Verbal communication 14 2.2.2 Linguistic levels 16 2.2.3 Affective Prosody 17 2.2.4 Augmentative Prosody 18 2.3 Communication processes 18 2.3.1 Communication factors 19 2.3.2 Generation 20 2.3.3 Encoding 21 2.3.4 Decoding 22 2.3.5 Understanding 23 2.4 Discussion 23 2.5 Summary 24
3 The Text-to-Speech Problem 26 3.1 Speech and Writing 26 3.1.1 Physical nature 27 3.1.2 Spoken form and written form 28 3.1.3 Use 29 3.1.4 Prosodic and verbal content 31 3.1.5 Component balance 31 3.1.6 Non-linguistic content 32 3.1.7 Semiotic systems 33 3.1.8 Writing Systems 34 3.2 Reading aloud 35vi Summary of Contents 3.2.1 Reading silently and reading aloud 35 3.2.2 Prosody in reading aloud 36 3.2.3 Verbal content and style in reading aloud 37 3.3 Text-to-speech system organisation 38 3.3.1 The Common Form model 38 3.3.2 Other models 39 3.3.3 Comparison 40 3.4 Systems 41 3.4.1 A Simple text-to-speech system 41 3.4.2 Concept to speech 42 3.4.3 Canned Speech and Limited Domain Synthesis 43 3.5 Key problems in Text-to-speech 44 3.5.1 Text classification with respect to semiotic systems 44 3.5.2 Decoding natural language text 46 3.5.3 Naturalness 47 3.5.4 Intelligibility: encoding the message in signal 48 3.5.5 Auxiliary generation for prosody 49 3.5.6 Adapting the system to the situation 50 3.6 Summary 50
4 Text Segmentation and Organisation 52 4.1 Overview of the problem 52 4.2 Words and Sentences 53 4.2.1 What is a word? 54 4.2.2 Defining words in text-to-speech 55 4.2.3 Scope and morphology 59 4.2.4 Contractions and Clitics 60 4.2.5 Slang forms 61 4.2.6 Hyphenated forms 62 4.2.7 What is a sentence? 63 4.2.8 The lexicon 64 4.3 Text Segmentation 64 4.3.1 Tokenisation 64 4.3.2 Tokenisation and Punctuation 65 4.3.3 Tokenisation Algorithms 66 4.3.4 Sentence Splitting 67 4.4 Processing Documents 69 4.4.1 Markup Languages 69 4.4.2 Interpreting characters 70 4.5 Text-to-Speech Architectures 71 4.6 Discussion 76Summary of Contents vii 4.6.1 Further Reading 76 4.6.2 Summary 77
5 Text Decoding: Finding the words from the text 79 5.1 Overview of Text Decoding 79 5.2 Text Classification Algorithms 80 5.2.1 Features and algorithms 80 5.2.2 Tagging and word sense disambiguation 83 5.2.3 Ad-hoc approaches 84 5.2.4 Deterministic rule approaches 84 5.2.5 Decision lists 86 5.2.6 Naive Bayes Classifier 87 5.2.7 Decision trees 88 5.2.8 Part-of-speech Tagging 89 5.3 Non-Natural Language Text 93 5.3.1 Semiotic Classification 93 5.3.2 Semiotic Decoding 96 5.3.3 Verbalisation 96 5.4 Natural Language Text 99 5.4.1 Acronyms and letter sequences 100 5.4.2 Homograph disambiguation 101 5.4.3 Non-homographs 102 5.5 Natural Language Parsing 103 5.5.1 Context Free Grammars 103 5.5.2 Statistical Parsing 105 5.6 Discussion 106 5.6.1 Further reading 109 5.6.2 Summary 110
6 Prosody Prediction from Text 112 6.1 Prosodic Form 112 6.2 Phrasing 113 6.2.1 Phrasing Phenomena 113 6.2.2 Models of Phrasing 114 6.3 Prominence 117 6.3.1 Syntactic prominence patterns 117 6.3.2 Discourse prominence patterns 119 6.3.3 Prominence systems, data and labelling 120 6.4 Intonation and tune 122 6.5 Prosodic Meaning and Function 123 6.5.1 Affective Prosody 123 6.5.2 Suprasegmental 124viii Summary of Contents 6.5.3 Augmentative Prosody 125 6.5.4 Symbolic communication and prosodic style 127 6.6 Determining Prosody from the Text 128 6.6.1 Prosody and human reading 128 6.6.2 Controlling the degree of augmentative prosody 129 6.6.3 Prosody and synthesis techniques 129 6.7 Phrasing prediction 130 6.7.1 Experimental formulation 130 6.7.2 Deterministic approaches 131 6.7.3 Classifier approaches 133 6.7.4 HMM approaches 134 6.7.5 Hybrid approaches 137 6.8 Prominence Prediction 137 6.8.1 Compound noun phrases 137 6.8.2 Function word prominence 139 6.8.3 Data driven approaches 139 6.9 Intonational Tune Prediction 140 6.10 Discussion 140 6.10.1 Labelling schemes and labelling accuracy 140 6.10.2 Linguistic theories and prosody 142 6.10.3 Synthesizing suprasegmental and true prosody 143 6.10.4 Prosody in real dialogues 144 6.10.5 Conclusion 145 6.10.6 Summary 145
7 Phonetics and Phonology 147 7.1 Articulatory phonetics and speech production 147 7.1.1 The vocal organs 148 7.1.2 Sound sources 148 7.1.3 Sound output 151 7.1.4 The vocal tract filter 153 7.1.5 Vowels 153 7.1.6 Consonants 155 7.1.7 Examining speech production 157 7.2 Acoustics phonetics and speech perception 158 7.2.1 Acoustic representations 159 7.2.2 Acoustic characteristics 161 7.3 The communicative use of speech 162 7.3.1 Communicating discrete information with a continuous channel 162 7.3.2 Phonemes, phones and allophones 164 7.3.3 Allophonic variation and phonetic context 168Summary of Contents ix 7.3.4 Coarticulation, targets and transients 169 7.3.5 The continuous nature of speech 170 7.3.6 Transcription 171 7.3.7 The distinctiveness of speech in communication 173 7.4 Phonology: the linguistic organisation of speech 173 7.4.1 Phonotactics 174 7.4.2 Word formation 180 7.4.3 Distinctive Features and Phonological Theories 182 7.4.4 Syllables 185 7.4.5 Lexical Stress 187 7.5 Discussion 190 7.5.1 Further reading 190 7.5.2 Summary 191
8 Pronunciation 193 8.1 Pronunciation representations 193 8.1.1 Why bother? 193 8.1.2 Phonemic and phonetic input 194 8.1.3 Difficulties in deriving phonetic input 195 8.1.4 A Structured approach to pronunciation 196 8.1.5 Abstract phonological representations 197 8.2 Formulating a phonological representation system 198 8.2.1 Simple consonants and vowels 198 8.2.2 Difficult consonants 200 8.2.3 Diphthongs and affricates 201 8.2.4 Approximant-vowel combinations 202 8.2.5 Defining the full inventory 203 8.2.6 Phoneme names 205 8.2.7 Syllabic issues 207 8.3 The Lexicon 208 8.3.1 Lexicon and Rules 209 8.3.2 Lexicon formats 211 8.3.3 The offline lexicon 214 8.3.4 The system lexicon 215 8.3.5 Lexicon quality 216 8.3.6 Determining the pronunciation of unknown words 217 8.4 Grapheme-to-Phoneme Conversion 219 8.4.1 Rule based techniques 219 8.4.2 Grapheme to phoneme alignment 220 8.4.3 Neural networks 220 8.4.4 Pronunciation by analogy 221x Summary of Contents 8.4.5 Other data driven techniques 222 8.4.6 Statistical Techniques 222 8.5 Further Issues 223 8.5.1 Morphology 223 8.5.2 Language origin and names 224 8.5.3 Post-lexical processing 224 8.6 Summary 225
9 Synthesis of Prosody 227 9.1 Intonation Overview 227 9.1.1 F0 and pitch 228 9.1.2 Intonational form 228 9.1.3 Models of F0 contours 230 9.1.4 Micro-prosody 231 9.2 Intonational Behaviour 231 9.2.1 Intonational tune 232 9.2.2 Downdrift 233 9.2.3 Pitch Range 235 9.2.4 Pitch Accents and Boundary tones 237 9.3 Intonation Theories and Models 239 9.3.1 Traditional models and the British school 239 9.3.2 The Dutch school 239 9.3.3 Autosegmental-Metrical and ToBI models 240 9.3.4 The INTSINT Model 241 9.3.5 The Fujisaki model and Superimpositional Models 242 9.3.6 The Tilt model 244 9.3.7 Comparison 246 9.4 Intonation Synthesis with AM models 248 9.4.1 Prediction of AM labels from text 248 9.4.2 Deterministic synthesis methods 249 9.4.3 Data Driven synthesis methods 250 9.4.4 Analysis with Autosegmental models 250 9.5 Intonation Synthesis with Deterministic Acoustic Models 251 9.5.1 Synthesis with superimpositional models 251 9.5.2 Synthesis with the Tilt model 252 9.5.3 Analysis with Fujisaki and Tilt models 252 9.6 Data Driven Intonation Models 252 9.6.1 Unit selection style approaches 253 9.6.2 Dynamic System Models 254 9.6.3 Hidden Markov models 255 9.6.4 Functional models 256Summary of Contents xi 9.7 Timing 257 9.7.1 Formulation of the timing problem 257 9.7.2 The nature of timing 258 9.7.3 Klatt rules 259 9.7.4 Sums of products model 260 9.7.5 The Campbell model 260 9.7.6 Other regression techniques 261 9.8 Discussion 261 9.8.1 Further Reading 262 9.8.2 Summary 263
10 Signals and Filters 265 10.1 Analogue signals 265 10.1.1 Simple periodic signals: sinusoids 266 10.1.2 General periodic signals 268 10.1.3 Sinusoids as complex exponentials 270 10.1.4 Fourier Analysis 272 10.1.5 Frequency domain 275 10.1.6 The Fourier transform 278 10.2 Digital signals 283 10.2.1 Digital waveforms 283 10.2.2 Digital representations 284 10.2.3 The discrete-time Fourier transform 284 10.2.4 The discrete Fourier transform 285 10.2.5 The z-Transform 286 10.2.6 The frequency domain for digital signals 288 10.3 Properties of Transforms 288 10.3.1 Linearity 288 10.3.2 Time and Frequency Duality 289 10.3.3 Scaling 289 10.3.4 Impulse Properties 289 10.3.5 Time delay 290 10.3.6 Frequency shift 291 10.3.7 Convolution 291 10.3.8 Analytical and Numerical Analysis 292 10.3.9 Stochastic Signals 292 10.4 Digital Filters 292 10.4.1 Difference Equations 293 10.4.2 The impulse response 294 10.4.3 Filter convolution sum 296 10.4.4 Filter transfer function 297xii Summary of Contents 10.4.5 The transfer function and the impulse response 298 10.5 Digital filter analysis and design 299 10.5.1 Polynomial analysis: poles and zeros 299 10.5.2 Frequency Interpretation of z-domain transfer function 302 10.5.3 Filter characteristics 304 10.5.4 Putting it all together 310 10.6 Summary 313
11 Acoustic Models of Speech Production 316 11.1 Acoustic Theory of Speech Production 316 11.1.1 Components in the model 317 11.2 The physics of sound 318 11.2.1 Resonant systems 318 11.2.2 Travelling waves 321 11.2.3 Acoustic waves 323 11.2.4 Acoustic reflection 325 11.3 Vowel Tube Model 326 11.3.1 Discrete time and distance 327 11.3.2 Junction of two tubes 328 11.3.3 Special cases of junction 330 11.3.4 Two tube vocal tract model 331 11.3.5 Single tube model 333 11.3.6 Multi-tube vocal tract model 335 11.3.7 The all pole resonator model 337 11.4 Source and radiation models 338 11.4.1 Radiation 338 11.4.2 Glottal source 338 11.5 Model refinements 341 11.5.1 Modelling the nasal Cavity 341 11.5.2 Source positions in the oral cavity 343 11.5.3 Models with Vocal Tract Losses 344 11.5.4 Source and radiation effects 344 11.6 Discussion 345 11.6.1 Further reading 347 11.6.2 Summary 347
12 Analysis of Speech Signals 349 12.1 Short term speech analysis 350 12.1.1 Windowing 350 12.1.2 Short term spectral representations 351 12.1.3 Frame lengths and shifts 353 12.1.4 The spectrogram 358Summary of Contents xiii 12.1.5 Auditory scales 358 12.2 Filter bank analysis 359 12.3 The Cepstrum 360 12.3.1 Cepstrum definition 360 12.3.2 Treating the magnitude spectrum as a signal 361 12.3.3 Cepstral analysis as deconvolution 362 12.3.4 Cepstral analysis discussion 363 12.4 Linear prediction analysis 364 12.4.1 Finding the coefficients: the covariance method 365 12.4.2 Autocorrelation Method 367 12.4.3 Levinson Durbin Recursion 369 12.5 Spectral envelope and vocal tract representations 370 12.5.1 Linear prediction spectra 370 12.5.2 Transfer function poles 372 12.5.3 Reflection coefficients 372 12.5.4 Log area ratios 375 12.5.5 Line spectrum frequencies 375 12.5.6 Linear prediction cepstrum 377 12.5.7 Mel-scaled cepstrum 378 12.5.8 Perceptual linear prediction 378 12.5.9 Formant tracking 378 12.6 Source representations 380 12.6.1 Residual signals 380 12.6.2 Closed-phase analysis 383 12.6.3 Open-phase analysis 385 12.6.4 Impulse/noise models 386 12.6.5 Parameterization of glottal flow signal 387 12.7 Pitch and epoch detection 388 12.7.1 Pitch detection 388 12.7.2 Epoch detection: finding the instant of glottal closure 390 12.8 Discussion 393 12.8.1 Further reading 394 12.8.2 Summary 394
13 Synthesis Techniques Based on Vocal Tract Models 396 13.1 Synthesis specification: the input to the synthesiser 396 13.2 Formant Synthesis 397 13.2.1 Sound sources 398 13.2.2 Synthesizing a single formant 399 13.2.3 Resonators in series and parallel 400 13.2.4 Synthesizing consonants 402xiv Summary of Contents 13.2.5 Complete synthesiser 403 13.2.6 The phonetic input to the synthesiser 405 13.2.7 Formant synthesis quality 407 13.3 Classical Linear Prediction Synthesis 408 13.3.1 Comparison with formant synthesis 409 13.3.2 Impulse/noise source model 410 13.3.3 Linear prediction diphone concatenative synthesis 411 13.3.4 Complete synthesiser 413 13.3.5 Problems with the source 414 13.4 Articulatory synthesis 415 13.5 Discussion 417 13.5.1 Further reading 419 13.5.2 Summary 420
14 Synthesis by Concatenation and Signal Processing Modification 422 14.1 Speech units in second generation systems 423 14.1.1 Creating a diphone inventory 424 14.1.2 Obtaining diphones from speech 425 14.2 Pitch synchronous overlap and add (PSOLA) 426 14.2.1 Time domain PSOLA 426 14.2.2 Epoch manipulation 427 14.2.3 How does PSOLA work? 430 14.3 Residual excited linear prediction 433 14.3.1 Residual manipulation 434 14.3.2 Linear Prediction PSOLA 434 14.4 Sinusoidal models 435 14.4.1 Pure sinusoidal models 435 14.4.2 Harmonic/Noise Models 437 14.5 MBROLA 440 14.6 Synthesis from Cepstral Coefficients 440 14.7 Concatenation Issues 442 14.8 Discussion 444 14.8.1 Further Reading 444 14.8.2 Summary 444
15 Hidden Markov Model Synthesis 446 15.1 The HMM formalism 447 15.1.1 Observation probabilities 447 15.1.2 Delta coefficients 450 15.1.3 Acoustic representations and covariance 450 15.1.4 States and transitions 452 15.1.5 Recognising with HMMs 452Summary of Contents xv 15.1.6 Language models 455 15.1.7 The Viterbi algorithm 456 15.1.8 Training HMMS 459 15.1.9 Context-sensitive modelling 463 15.1.10 Are HMMs a good model of speech? 467 15.2 Synthesis from hidden Markov models 468 15.2.1 Finding the most likely observations given the state sequence 469 15.2.2 Finding the most likely observations and state sequence 471 15.2.3 Acoustic representations 474 15.2.4 Context sensitive synthesis models 475 15.2.5 Duration modelling 476 15.2.6 HMM synthesis systems 476 15.3 Labelling databases with HMMs 477 15.3.1 Determining the word sequence 477 15.3.2 Determining the phone sequence 478 15.3.3 Determining the phone boundaries 478 15.3.4 Measuring the quality of the alignments 480 15.4 Other data driven synthesis techniques 481 15.5 Discussion 481 15.5.1 Further Reading 481 15.5.2 Summary 482
16 Unit Selection Synthesis 484 16.1 From Concatenative Synthesis to Unit Selection 484 16.1.1 Extending concatenative synthesis 485 16.1.2 The Hunt and Black Algorithm 488 16.2 Features 489 16.2.1 Base Types 489 16.2.2 Linguistic and Acoustic features 491 16.2.3 Choice of features 492 16.2.4 Types of features 493 16.3 The Independent Feature Target Function Formulation 494 16.3.1 The purpose of the target function 494 16.3.2 Defining a perceptual space 496 16.3.3 Perceptual spaces defined by independent features 496 16.3.4 Setting the target weights using acoustic distances 498 16.3.5 Limitations of the independent feature formulation 502 16.4 The Acoustic Space Target Function Formulation 503 16.4.1 Decision tree clustering 504 16.4.2 General partial-synthesis functions 506 16.5 Join functions 508xvi Summary of Contents 16.5.1 Basic issues in joining units 508 16.5.2 Phone class join costs 509 16.5.3 Acoustic distance join costs 510 16.5.4 Combining categorical and and acoustic join costs 511 16.5.5 Probabilistic and sequence join costs 512 16.5.6 Join classifiers 514 16.6 Search 515 16.6.1 Base Types and Search 516 16.6.2 Pruning 519 16.6.3 Pre-selection 520 16.6.4 Beam Pruning 520 16.6.5 Multi-pass search 520 16.7 Discussion 521 16.7.1 Unit selection and signal processing 522 16.7.2 Features, costs and perception 523 16.7.3 Example unit selection systems 524 16.7.4 Further Reading 526 16.7.5 Summary 526
17 Further Issues 528 17.1 Databases 528 17.1.1 Unit Selection Databases 528 17.1.2 Text materials 529 17.1.3 Prosody databases 530 17.1.4 Labelling 530 17.1.5 What exactly is hand labelling? 531 17.1.6 Automatic labelling 532 17.1.7 Avoiding explicit labels 532 17.2 Evaluation 533 17.2.1 System Testing: Intelligibility and Naturalness 534 17.2.2 Word recognition tests 534 17.2.3 Naturalness tests 535 17.2.4 Test data 536 17.2.5 Unit or Component testing 536 17.2.6 Competitive evaluations 538 17.3 Audiovisual Speech Synthesis 538 17.3.1 Speech Control 539 17.4 Synthesis of Emotional and Expressive Speech 540 17.4.1 Describing Emotion 540 17.4.2 Synthesizing emotion with prosody control 541 17.4.3 Synthesizing emotion with voice transformation 542Summary of Contents xvii 17.4.4 Unit selection and HMM techniques 542 17.5 Summary 543
18 Conclusion 545 18.1 Speech Technology and Linguistics 545 18.2 Future Directions 548 18.3 Conclusion 550
Appendix 552
A Probability 552 A.1 Discrete Probabilities 552 A.1.1 Discrete Random Variables 552 A.1.2 Probability Mass Function 553 A.1.3 Expected Values 553 A.1.4 Moments of a PMF 554 A.2 Pairs of discrete random variables 554 A.2.1 Marginal Distributions 555 A.2.2 Independence 555 A.2.3 Expected Values 556 A.2.4 Moments of a joint distribution 556 A.2.5 Higher-Order Moments and covariance 556 A.2.6 Correlation 557 A.2.7 Conditional Probability 557 A.2.8 Bayes’ Rule 558 A.2.9 Sum of Random Variables 558 A.2.10 The chain rule 559 A.2.11 Entropy 559 A.3 Continuous Random Variables 560 A.3.1 Continuous Random Variables 560 A.3.2 Expected Values 562 A.3.3 Gaussian Distribution 562 A.3.4 Uniform Distribution 562 A.3.5 Cumulative Density Functions 563 A.4 Pairs of Continuous Random Variables 563 A.4.1 Independence vs Uncorrelated 564 A.4.2 Sum of Two Random Variables 565 A.4.3 Entropy 565 A.4.4 Kullback-Leibler Distance 565 B Phone Definitions 567
- Спойлер:
Text-to-Speech Synthesis
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 10 2009, 22:16 | Чт Сен 10 2009, 22:16 | |
| Improvements in Speech Synthesis E. Keller, G. Bailly, A. Monaghan, and J. Terken
Description:
Naturalness in synthetic speech is one of the most intractable problems in information technology today. Although speech synthesis systems have improved considerably over the last 20 years, they rarely sound entirely like human speakers.
Why is this so, and what can be done about it? Prosodic processing must be rendered more varied and more appropriate to the speech situation Timing, melodic control and the relationships between the various prosodic parameters need increased attention
Signal processing systems must be developed and perfected that are capable of generating more than just one voice from a database A better understanding must be achieved of what distinguishes one voice from another, and of how speech styles differ between simply reading aloud numbers and sentences and their use in interactive speech New evaluation methodologies should be developed to provide objective and subjective measurements of the intelligibility of the synthetic speech and the cognitive load imposed upon the listener by impoverished stimuli Adequate text markup systems must be proposed and tested with multiple languages in real-world situations Further research is required to integrate speech synthesis systems into larger natural-language processing systems
Improvements in Speech Synthesis presents the latest research in the above areas. Contributors include speech synthesis specialists from 16 countries, with experience in the development of systems for 12 European languages. This volume emerges from a four-year European COST project focussed on "The Naturalness of Synthetic Speech", and will be a valuable text for everyone involved in speech synthesis.
- Спойлер:
Part I Issues in Signal Generation 1 Towards Greater Naturalness: Future Directions of Research in Speech Synthesis 2 Towards More Versatile Signal Generation Systems 3 A Parametric Harmonic — Noise Model 4 The COST 258 Signal Generation Test Array 5 Concatenative Text-to-Speech Synthesis Based on Sinusoidal Modeling 6 Shape Invariant Pitch and Time-Scale Modification of Speech Based on a Harmonic Model 7 Concatenative Speech Synthesis Using SRELP
Part II Issues in Prosody 8 Prosody in Synthetic Speech: Problems, Solutions and Challenges 9 State-of-the-Art Summary of European Synthetic Prosody R&D 10 Modeling FO in Various Romance Languages: Implementation in Some TTS Systems 11 Acoustic Characterization of the Tonic Syllable in Portuguese 12 Prosodic Parameters of Synthetic Czech: Developing Rules for Duration and Intensity 13 MFGI, a Linguistically Motivated Quantitative Model of German Prosody 14 Improvements in Modeling the FO Contour for Different Types of Intonation Units in Slovene 15 Representing Speech Rhythm 16 Phonetic and Timing Considerations in a Swiss High German TTS System 17 Corpus-based Development of Prosodic Models Across Six Languages 18 Vowel Reduction in German Read Speech
Parr III Issues in Styles of Speech 19 Variability and Speaking Styles in Speech Synthesis 20 An Auditory Analysis of the Prosody of Fast and 21 Automatic Prosody Modeling of Galician and its Application to Spanish 22 Reduction and Assimilatory Processes in Conversational French Speech: Implications for Speech Synthesis 23 Acoustic Patterns of Emotions 24 The Role of Pitch and Tempo in Spanish Emotional Speech: Towards Concatenative Synthesis 25 Voice Quality and the Synthesis of Affect 26 Prosodic Parameters of a 'Fun' Speaking Style 27 Dynamics of the Glottal Source Signal: Implications for Naturalness in Speech Synthesis 28 A Nonlinear Rhythmic Component in Various Styles of Speech
Pan IV Issues in Segmentation and Mark-up 29 Issues in Segmentation and Mark-up 30 The Use and Potential of Extensible Mark-up (XML) in Speech Generation 31 Mark-up for Speech Synthesis: A Review and Some Suggestions 32 Automatic Analysis of Prosody for Multi-lingual Speech Corpora 33 Automatic Speech Segmentation Based on Alignment with a Text-to-Speech System 34 Using the COST 249 Reference Speech Recognizer for Automatic Speech Segmentation
Part V Future Challenges 35 Future Challenges 36 Towards Naturalness, or the Challenge of Subjectiveness 37 Synthesis Within Multi-Modal Systems 38 A Multi-Modal Speech Synthesis Tool Applied to Audio-Visual Prosody 39 Interface Design for Speech Synthesis Systems
Index
- Спойлер:
Improvements in Speech Synthesis
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 10 2009, 22:43 | Чт Сен 10 2009, 22:43 | |
| Speech Acoustics and Phonetics: Selected Writings (Text, Speech and Language Technology) Gunnar Fant
Description:
The overall aim of the book is to provide an integrated view of the separate stages of the speech chain, covering the production process, speech data analysis, and speech perception. Analyses of information bearing elements of the speech signal have found applications in linguistic theory and in the knowledge base of speech technology with special reference to speech synthesis.
The book contains 19 selected articles organized in 6 chapters: Speech research overview with a historical outline, Speech production and synthesis, The voice source, Speech analysis and features, Speech perception, Prosody.
Each chapter is preceded by an introduction including suggestions for additional reading. A list of all the author’s publications since 1945 is included. It is supplemented by an ordering in categories. The articles have been selected to ensure representative coverage of the field. Some of them, primarily those on speech acoustics and the human voice source, have been previously published. During the last 15 years, a major emphasis has been on speech prosody with several novel approaches. A recent major article provides a broad frame starting with aerodynamics and voice source properties, leading up to intonation analysis, prosodic grouping, and rules for text-to-speech synthesis. These are illustrated in an audio file. A novel feature introduced in analysis as well as synthesis is a parameter of perceived syllable and word prominence with acoustical correlates and ties to lexical categories. The author was involved in early developments of distinctive feature theory together with Roman Jakobson and Morris Halle. Applications to Swedish are contained in the book. A major issue in current phonology and phonetics has been the search for absolute invariance of speech features. However, with the growing insight into contextual variability, this remains a pseudo problem. In order to approach the essence of the speech code, we need to structure variability with respect to all possible contextual factors. As claimed by the author, this is not only a requirement for a sound development of general phonetics and phonology. It is also a prerequisite for realizing advanced aims of speech technology. Computer power cannot substitute fundamental knowledge of the human speech communication process. The book should accordingly be of interest for several disciplines, not only speech technology, linguistics, phonetics, and acoustics, but also for psychology and physiology of speech and hearing with applications in medical science. CONTENTS Foreword vii Preface ix Introduction xi List of selected articles xiii 1. Speech research overview 1 2. Speech production and synthesis 15 3. The voice source 93 4. Speech analysis and features 143 5. Speech perception 199 6. Prosody 221 Publication list 1945–2004 301 Reference categories 319
- Спойлер:
Speech Acoustics and Phonetics: Selected Writings (Text, Speech and Language Technology)
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 17 2009, 00:37 | Чт Сен 17 2009, 00:37 | |
| Multilingual Speech Processing Tanja Schultz, Katrin Kirchhoff
Description: Tanja Schultz and Katrin Kirchhoff have compiled a comprehensive overview of speech processing from a multilingual perspective. By taking this all-inclusive approach to speech processing, the editors have included theories, algorithms, and techniques that are required to support spoken input and output in a large variety of languages. This book presents a comprehensive introduction to research problems and solutions, both from a theoretical as well as a practical perspective, and highlights technology that incorporates the increasing necessity for multilingual applications in our global community.
Current challenges of speech processing and the feasibility of sharing data and system components across different languages guide contributors in their discussions of trends, prognoses and open research issues. This includes automatic speech recognition and speech synthesis, but also speech-to-speech translation, dialog systems, automatic language identification, and handling non-native speech. The book is complemented by an overview of multilingual resources, important research trends, and actual speech processing systems that are being deployed in multilingual human-human and human-machine interfaces.
Researchers and developers in industry and academia with different backgrounds but a common interest in multilingual speech processing will find an excellent overview of research problems and solutions detailed from theoretical and practical perspectives.
* State-of-the-art research with a global perspective by authors from the USA, Asia, Europe, and South Africa * The only comprehensive introduction to multilingual speech processing currently available * Detailed presentation of technological advances integral to security, financial, cellular and commercial applications
Contents
- Спойлер:
Contributor Biographies xvii Foreword xxvii
1 Introduction 1
2 Language Characteristics 5 2.1 Languages and Dialects .................................................. 5 2.2 Linguistic Description and Classification .................................. 8 2.3 Language in Context ..................................................... 20 2.4 Writing Systems ......................................................... 22 2.5 Languages and Speech Technology ....................................... 30
3 Linguistic Data Resources 33 3.1 Demands and Challenges of Multilingual Data-Collection Efforts .......... 33 3.2 International Efforts and Cooperation ..................................... 40 3.3 Data Collection Efforts in the United States ............................... 44 3.4 Data Collection Efforts in Europe ......................................... 55 3.5 Overview Existing Language Resources in Europe ........................ 64
4 Multilingual Acoustic Modeling 71 4.1 Introduction ............................................................. 71 4.2 Problems and Challenges ................................................. 79 4.3 Language Independent Sound Inventories and Representations ............. 91 4.4 Acoustic Model Combination............................................. 102 4.5 Insights and Open Problems .............................................. 118
5 Multilingual Dictionaries 123 5.1 Introduction ............................................................. 123 5.2 Multilingual Dictionaries ................................................. 125 5.3 What Is aWord? ......................................................... 129 5.4 Vocabulary Selection..................................................... 141 5.5 How to Generate Pronunciations .......................................... 149 5.6 Discussion .............................................................. 166
6 Multilingual Language Modeling 169 6.1 Statistical Language Modeling............................................ 169 6.2 Model Estimation for New Domains and Speaking Styles .................. 174 6.3 Crosslingual Comparisons: A Language Modeling Perspective ............. 177 6.4 Crosslinguistic Bootstrapping for Language Modeling ..................... 193 6.5 Language Models for Truly Multilingual Speech Recognition .............. 199 6.6 Discussion and Concluding Remarks ..................................... 202
7 Multilingual Speech Synthesis 207 7.1 Background ............................................................. 208 7.2 Building Voices in New Languages ....................................... 208 7.3 Database Design ......................................................... 213 7.4 Prosodic Modeling ....................................................... 216 7.5 Lexicon Building ........................................................ 219 7.6 Non-native Spoken Output ............................................... 230 7.7 Summary ................................................................ 231
8 Automatic Language Identification 233 8.1 Introduction ............................................................. 234 8.2 Human Language Identification .......................................... 235 8.3 Databases and Evaluation Methods ....................................... 240 8.4 The Probabilistic LID Framework ........................................ 242 8.5 Acoustic Approaches ..................................................... 245 8.6 Phonotactic Modeling .................................................... 251 8.7 Prosodic LID ............................................................ 262 8.8 LVCSR-Based LID ...................................................... 266 8.9 Trends and Open Problems in LID ........................................ 268
9 Other Challenges: Non-native Speech, Dialects, Accents,and Local Interfaces 9.1 Introduction ............................................................ 273 9.2 Characteristics of Non-native Speech .................................... 276 9.3 Corpus Analysis ........................................................ 278 9.4 Acoustic Modeling Approaches for Non-native Speech ................... 287 9.5 Adapting to Non-native Accents in ASR ................................. 288 9.6 Combining Speaker and Pronunciation Adaptation........................ 298 9.7 Cross-Dialect Recognition of Native Dialects ............................ 299 9.8 Applications ............................................................ 301 9.9 Other Factors in Localizing Speech-Based Interfaces ..................... 309 9.10 Summary............................................................... 315
10 Speech-to-Speech Translation 317 10.1 Introduction ............................................................ 317 10.2 Statistical and Interlingua-Based Speech Translation Approaches .......... 320 10.3 Coupling Speech Recognition and Translation ............................ 341 10.4 Portable Speech-to-Speech Translation: The ATR System ................. 347 10.5 Conclusion ............................................................. 394
11 Multilingual Spoken Dialog Systems 399 11.1 Introduction ............................................................ 399 11.2 PreviousWork .......................................................... 403 11.3 Overview of the ISIS System ............................................ 407 11.4 Adaptivity to Knowledge Scope Expansion .............................. 417 11.5 Delegation to Software Agents .......................................... 425 11.6 Interruptions and Multithreaded Dialogs ................................. 427 11.7 Empirical Observations on User Interaction with ISIS .................... 433 11.8 Implementation of Multilingual SDS in VXML .......................... 437 11.9 Summary and Conclusions .............................................. 443
Bibliography 449 Index 491
- Спойлер:
:31:Multilingual Speech Processing
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 24 2009, 23:38 | Чт Сен 24 2009, 23:38 | |
| Анализ, синтез и восприятие речи Дж. Фланаган
В монографии Дж.Фланагана, известного американского ученого, подробно рассматриваются широкий круг вопросов, связанных со свойствами речи как переносчика информации, основные ее параметры, проблемы анализа, синтеза и автоматического распознавания. Оцениваются характеристики каналов речевой связи. Большое внимание уделяется рассмотрению проблем синтетической телефонии; описываются различные вокодеры, полувокодеры и другие способы и методы сокращения полосы частот, занимаемой речью. Книга найдет многих читателей не только среди специалистов в области техники связи, но также среди математиков-кибернетиков, физиологов, лингвистов, филологов, акустиков и других специалистов, имеющих дело с техникой передачи, приема, хранения, исследования речевых сигналов и использования их для управления машинами. ОГЛАВЛЕНИЕ
- Спойлер:
Предисловие к русскому изданию От автора От редактора русского перевода
I. Речевая связь 1.1. Возникновение телефонии 1.2. Эффективная передача речи 1.3. Пропускная способность человека как канала передачи информации 1.4. Синтетическая телефония: подход к повышению эффективности
II. Процесс речеобразования 2.1. Физиология органов речи 2.2. Звуки речи 2.2.1. Общие сведения 2.2.2. Гласные 2.2.3. Согласные 2.3. Количественное описание речи
III. Акустические свойства речевого аппарата 3.1. Речевой тракт как акустическая система 3.2. Эквивалентная схема для цилиндрической трубы с потерями 3.2.1. Общие соотношения 3.2.2. Акустическое «L» 3.2.3. Акустическое «R» 3.2.4. Акустическое «С» 3.2.5. Акустическое «G» 3.2.6. Заключение по эквивалентным представлениям акустических величин 3.3. Нагрузочное сопротивление излучения через рот и ноздри 3.4. Распространение звука в пространстве вокруг головы 3.5. Голосовой источник 3.5.1. Возбуждение голосовыми связками 3.5.2. Импеданс голосовой щели 3.5.3. Эквивалентная схема голосового источника для переменного тока 3.6. Источник шумового и импульсного возбуждения тракта 3.7. Некоторые свойства передаточной функции речевого тракта 3.7.1. Определение передаточной -функции 3.7.2. Влияние нагрузки излучения на распределение полюсов тракта 3.7.3. Влияние импеданса голосовой щели на распределение полюсов тракта 3.7.4. Влияние колебаний стенок полости 3.7.5. Аппроксимация голосового тракта двумя трубками 3.7.6. Возбуждение источником, смещенным вперед по продольной оси тракта 3.7.7. Влияние носового тракта 3.7.8. Четырехтрубное, трехпараметровое приближение к артикуляции гласных 3.7.9. Многотрубные модели и электрические аналоги речевого тракта 3.8. Применение основных свойств речи и слуха в синтетической телефонии
IV. Ухо и слух 4.1. Устройство уха 4.1.1. Общая схема 4.1.2. Наружное ухо 4.1.3. Среднее ухо 4.1.4. Внутреннее ухо 4.1.5. Преобразование механических колебаний в нервное возбуждение 4.1.6. Проводящие пути в слуховой нервной системе 4-2. Математические модели уха 4.2.1. Постановка задачи 4.2.2. Модель базилярной мембраны 4.2.3. Передаточная функция среднего уха 4.2.4. Эквивалентная передаточная функция среднего уха и базилярной мембраны. 4.2.5. Электрическая схема, модулирующая смещение базилярной мембраны 4.2.6. Моделирование движений мембраны на вычислительной машине 4.2.7. Моделирование улитки с помощью длинной линии 4.3. Иллюстрация соотношений между субъективным и физиологическим поведением 4.3.1. Основные предположения 4.3.2. Восприятие высоты звука 4.3.3. Бинауральная локализация 4.3.4. Пороговая чувствительность 4.3.5. Обработка сложных сигналов в слуховой системе
V. Устройства для анализа речи 5.1. Спектральный анализ речи 5.1.1, Кратковременный частотный анализ 5.1.2, Измерение мгновенного спектра 5.1.3, Выбор весовой функции 5.1.4, Звуковой спектрограф 5.1.5, Кратковременная функция корреляции и мгновенный спектр мощности 5.1.6, Средний спектр мощности 5.1.7, Измерение среднего спектра мощности речи 5.2. Форматный анализ речи 5.2Л. О форматной структуре речи 5.2.2. Выделение форматных частот 5.2.3. Измерение ширины форматных полос 5.3. Анализ основного тона голоса 5.4. Артикуляторный анализ механизма речеобразования 5.5. Автоматическое распознавание речи 5.6. Автоматическое распознавание диктора
VI. Синтез речи 6.1. Механические говорящие машины; исторический обзор 6.2. Электрические методы синтеза речи 6.2.1. Методы восстановления сигналов с заданным спектром 6.2.2. Синтезаторы-четырехполюсники 6.2.3. Аналоги речевого тракта, построенные на основе линии передачи 6.2.4. Возбуждение электрических синтезаторов 6.2.5. Факторы, связанные с излучением 6.2.6. Моделирование синтеза речи на вычислительных машинах
VII. Восприятие речи и речеподобных звуков 7.1. Дифференциальное и абсолютное различения 7.2. Дифференциальная разрешающая способность по координатам речевого сигнала 7.2.1. О чувствительности слуха к изменению координат речевого сигнала 7.2.2. Пороговые значения для -частот формантных максимумов 7.2.3. Пороговые значения для амплитуд формантных максимумов 7.2.4. Пороговая чувствительность к ширине формант 7.2.5. Пороговая чувствительность к частоте основного тона 7.2.6. Пороговые значения для интенсивности возбуждения 7.2.7. Порог чувствительности к нулям спектра импульсов основного тона 7.2.8. Различимость максимумов и минимумов спектра шума 7.2.9. Другие оценки, полученные методом непосредственного сравнения 7.2.10. Дифференциальная различимость в артикуляционной области 7.3. Абсолютное различение речи и речеподобных звуков 7.3.1. Абсолютное опознавание звуков 7.3.2. Абсолютное опознавание слогов 7.3.3. Влияние обучения и лингвистических ассоциаций на абсолютную опознаваемость речеподобных сигналов 7.3.4. Влияние лингвистических ассоциаций на дифференциальную различимость 7.4. Влияние контекста и словаря на восприятие речи 7.5. Единицы восприятия речи 7.6. Артикуляционный метод оценки качества телефонных трактов 7.7. Расчет разборчивости по характеристикам тракта и уровню шума. Индекс артикуляции 7.8. Дополнительные сенсорные каналы восприятия речи 7.8.1. Спектрограф «видимой речи» 7.8.2. Тактильный вокодер 7.8.3. Низкочастотный вокодер
VIII. Системы синтетической телефонии 8.1. Полосные вокодеры 8.1 Л- Изобретение Гомера Дадли 8.1.2. Уплотнение полосных вокодеров 8.1.3. Эксплуатационные качества вокодера 8.2. Полосные вокодеры с сокращенной избыточностью 8.2.1. Вокодер с селекцией максимумов 8.2.2. Линейное преобразование спектральных сигнал-параметров полосного вокодера 8.2.3. Вокодеры с эталонами спектральных функций 8.3. Полувокодеры 8.3.!. Проблема улучшения естественности 8.3.2. Уплотнение и дискретизация 8.4. Корреляционные вокодеры 8.5. Формантные вокодеры 8.5.1. Принцип формантного анализа и синтеза речи 8.5.2. Уплотнение и дискретизация формантных вокодеров 8.5.3. Формантные полувокодеры 8.6. Артикуляторные вокодеры 8.7. Другие методы сокращения полосы 8.7.1. Ограничение полосы .и соотношение сигнал/шум 8.7.2. Амплитудное квантование и кодирование. Клиппированная речь 8.7.3. Частотное деление и умножение. Временное сжатие и расширение 8.7.4. Метод статистического использования пауз речи (ТАСИ) 8.7.5. Представление речи ортогональными функциями
Литература Список литературы, добавленной редактором перевода
- Спойлер:
Анализ, синтез и восприятие речи
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 24 2009, 23:58 | Чт Сен 24 2009, 23:58 | |
| Схемы синтезаторов речи Кристиан Тавернье
В книге представлено описание и принципиальные схемы некоторых устройств, таких как схема музыкальной паузы для телефона, голосовой сигнал тревоги или бескассетный телефонный автоответчик. Наряду с этим здесь можно найти и автономные модули, предназначенные для работы в составе других схем и устройств, которые необходимо озвучить. Для изготовления предложенных схем не требуется ни специальных систем разработки, ни компьютера, поэтому рекомендации, приведенные в данном издании, будут полезны не только специалистам в области речевого синтеза, но и радиолюбителям.
- Спойлер:
Предисловие Введение в теорию синтеза речи Зачем нужен синтез речи Речевой аппарат человека Формантный синтез Кодирование методом линейного предсказания Фонемный синтез Цифровое кодирование речи Распознавание речи Заключение
Базовые модули Устройство эаписи-воспроизведения с ОЗУ объемом 256 Кбит Принцип работы Принципиальная схема Изготовление Проведение испытаний и эксплуатация Устройство записи-воспроизведения с ОЗУ объемом 1 Мбит Принципиальная схема Изготовление Проведение испытаний и эксплуатация Устройство воспроизведении с программируемым ПЗУ объемом 512 Кбит Принципиальная схема Изготовление Проведение испытаний и эксплуатация Программирование UVPROM
Программаторы для синтезаторов речи Автономный программатор Программирование UVPROM Принцип работы автономного программатора Принципиальная схема Изготовление Проведение испытаний и эксплуатация Программатор на псевдо-ROM Принципиальная схема Изготовление Проведение испытаний и эксплуатация
Устройства, использующие синтез речи Мини-АТС с синтезом речи Несколько понятий из области телефонии Принципиальная схема Изготовление Проведение испытаний и эксплуатация Установка Голосовой сигнал тревога Принципиальная схема Изготовление Проведение испытаний и эксплуатация Телефонный автоответчик с синтезом речи Модуль сопряжения с телефонной линией Модуль автоответчика Принципиальная схема Изготовление Проведение испытаний и эксплуатация
Синтезатор для компьютера Принцип работы Модуль синтезатора Сопряжение с компьютером Изготовление Подключении к компьютеру Проведение испытаний н эксплуатация Программирование
Другие области применении речевого синтеза Зашита речевых сообщений Аналоговая криптофония Цифровая криптофония Смешанная криптофония Микросхема РХ244 CML Принципиальная схема Изготовление Проведение испытаний и эксплуатация Цифровой преобразователь октав Микросхема MSM6322 Принципиальная схема Изготовление Проведение испытаний и эксплуатация
- Спойлер:
Схемы синтезаторов речи
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Пт Сен 25 2009, 19:45 | Пт Сен 25 2009, 19:45 | |
| Expression in Speech: Analysis and Synthesis Mark Tatham, Katherine Morton
Contents
- Спойлер:
Acknowledgements x Introduction 1
PART I EXPRESSION IN SPEECH Chapter 1 Natural Speech 11 1.1 The production and perception of speech 12 1.2 The basic model 17 1.3 Developing a model to include expressive content 21 1.4 Evaluating lines of research 22 1.5 The perception of a waveform’s expressive content 24 1.6 The test-bed for modelling expression 25
Chapter 2 Speech Synthesis 28 2.1 Modern synthesis techniques 28 2.2 Limitations of the synthesizer type 34
Chapter 3 Expression in Natural Speech 37 3.1 What lies behind expressive speech? 38 3.2 How does expression get into speech? 39 3.3 Neutral speech 40 3.4 Degrees of expression 41 3.5 The dynamic nature of expression 44 3.6 The ‘acoustic correlates’ paradigm for investigating expression 47 3.7 Some acoustic correlates 53 3.8 Hearing thresholds: the difference limen concept 56 3.9 Data collection 57 3.10 Listener reaction to expressive utterances 61
Chapter 4 Expression in Synthetic Speech 65 4.1 Synthesizing different modes of expression 65 4.2 What is needed for expressive synthetic speech? 69 4.3 Evaluating the results 71 4.4 Expression is systematic, but non-linear 73 4.5 Integrity of speakers and their expression 75 4.6 Optimizing synthesis techniques for rendering expression 78 4.7 Modelling naturalness and expressive content 82
Chapter 5 The Perception of Expression 86 5.1 The objective of speaker expression 87 5.2 Current limits to characterizing the acoustic triggers of listener reaction 88 5.3 Characterizing listener reaction to expressive signals 89 5.4 The listener’s ability to differentiate signals 90 5.5 Non-linearity in the acoustic/perceptual relationship 91
PART II TRANSFERRING NATURAL EXPRESSION TO SYNTHESIS Chapter 6 The State of the Art 97 6.1 The general approach 97 6.2 The representation of emotion in the minds of speakers and listeners 100 6.3 Defining emotion in general 102 6.4 Defining emotion in terms of acoustic correlates 105 6.5 Variability among acoustic correlates 107 6.6 The non-uniqueness of acoustic correlates 110 6.7 Reducing the number of variables 111 6.8 The range of emotive effects 113 6.9 The state of the art in synthesizing prosody 115 6.10 The theoretical basis 119 6.11 The state of the art in synthesizing expressiveness 121
Chapter 7 Emotion in Speech Synthesis 124 7.1 Type of synthesizer 125 7.2 Using prosody as the basis for synthesizing expression 129 7.3 Assessment and evaluation of synthesis results 132 7.4 Synthesis of emotions in speech: general problems 136 7.5 Linking parameters of emotion with acoustic parameters 140
Chapter 8 Recent Developments in Synthesis Models 150 8.1 The current state of thinking 150 8.2 Subtlety of expression 152 8.3 The expression space: metrics 153 8.4 Natural synthesis: feedback in the dialogue environment 154 8.5 Contemporary changes of approach to speech 158 8.6 Providing the synthesizer with listener feedback 160 8.7 Some production-for-perception considerations with expressive speech 164
PART III EXPRESSION AND EMOTION:THE RESEARCH Chapter 9 The Biology and Psychology Perspectives 167 9.1 Finding expressive content in speech 167 9.2 Is there a basis for modelling human expression? 168 9.3 Emotion: what is it? 169 9.4 The source of emotive content 172 9.5 Production of emotion: biological accounts 173 9.6 Production of emotion: cognitive accounts, with little or no biological substrate 177 9.7 Production of emotion: linking the biological and cognitive approaches 183 9.8 The function of emotion 187 9.9 Parameterization of emotion 189 9.10 Secondary emotions 189 9.11 Language terms and the use of words in characterizing emotion 191 9.12 The problems of labelling and classification 195 9.13 Concluding remarks 196
Chapter 10 The Linguistics, Phonology, and Phonetics Perspective 198 10.1 The nature of emotion 198 10.2 Databases for investigating expressiveness in the speech waveform 210 10.3 Speakers 216 10.4 Listeners 226
Chapter 11 The Speech Technology Perspective 236 11.1 Synthesis feasibility studies 236 11.2 Testing models of expression 250 11.3 Automatic speech recognition: the other side of the coin 259
Chapter 12 The Influence of Emotion Studies 264 12.1 How research into emotion can usefully influence work in speech 264 12.2 Emotion and speech synthesis 265 12.3 Prelude to an underlying model of emotion:the inadequacies of the speech model 267 12.4 An integrated physical/cognitive language model 269 12.5 Introducing a possible transferable model 273 12.6 Building a model for emotive synthesis: the goals 277 12.7 The evidence supporting biological and cognitive models suitable for speech work 280 12.8 Concluding and summarizing remarks 284
PART IV DEVELOPMENT OF AN INTEGRATED MODEL OF EXPRESSION Chapter 13 The Beginnings of a Generalized Model of Expression 289 13.1 Defining expressive speech 290 13.2 The simple composite soundwave model 294 13.3 Short-term and long-term expressiveness 296
Chapter 14 All Speech is Expression-Based 300 14.1 Neutral expression 302 14.2 Listener message sampling 303 14.3 The expression envelope 307 14.4 Defining neutral speech 310 14.5 Parametric representations 313 14.6 Data collection 320
Chapter 15 Expressive Synthesis: The Longer Term 327 15.1 What does the synthesizer need to do? 327 15.2 Phonology in the high-level system 331 15.3 Defining expression and transferring results from psychology 341 15.4 The supervisor model applied to expression 346 15.5 Is it critical how the goal of good synthesis is achieved? 347 15.6 Implications of utterance planning and supervision 349 15.7 Are synthesis systems up to the job? 349
Chapter 16 A Model of Speech Production Based on Expression and Prosody 355 16.1 The prosodic framework 355 16.2 Planning and rendering 357 16.3 Phonetics as a dynamic reasoning device 360 16.4 Phonological and Cognitive Phonetic processes 362 16.5 The speech production model’s architecture 364 16.6 Prosodic and expressive detail 374 16.7 Evaluating competing demands for expressive content:a task for the CPA 380 16.8 Spectral and articulatory detail 383 16.9 Planning and rendering utterances within prosodic wrappers 384 16.10 Speaking a specific utterance with expression 386 16.11 The proposed model of speech production 387
Conclusion 389 References 393 Bibliography 411 Author index 413 Subject index 417
- Спойлер:
Expression in Speech: Analysis and Synthesis
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Пн Сен 28 2009, 20:47 | Пн Сен 28 2009, 20:47 | |
| Applications of Digital Signal Processing to Audio and Acoustics Mark Kahrs, Karlheinz Brandenburg
Contents
- Спойлер:
List of Figures List of Tables Contributing Authors Introduction Audio quality determination based on perceptual measurement techniques 1 John G. Beerends 1.1 Introduction 1 1.2 Basic measuring philosophy 2 1.3 Subjective versus objective perceptual testing 6 1.4 Psychoacoustic fundamentals of calculating the internal sound repre- sentation 8 1.5 Computation of the internal sound representation 13 1.6 The perceptual audio quality measure (PAQM) 17 1.7 Validation of the PAQM on speech and music codec databases 20 1.8 Cognitive effects in judging audio quality 22 1.9 ITU Standardization 29 1.9.1 ITU-T, speech quality 30 1.9.2 ITU-R, audio quality 35 1. 10 Conclusions 37 2 Perceptual Coding of High Quality Digital Audio 39 Karlheinz Brandenburg 2.1 Introduction 39 2.2 Some Facts about Psychoacoustics 2.2.1 Masking in the Frequency Domain 2.2.2 Masking in the Time Domain 2.2.3 Variability between listeners 2.3 Basic ideas of perceptual coding 2.3.1 Basic block diagram 2.3.2 Additional coding tools 2.3.3 Perceptual Entropy 2.4 Description of coding tools 2.4.1 Filter banks 2.4.2 Perceptual models 2.4.3 Quantization and coding 2.4.4 Joint stereo coding 2.4.5 Prediction 2.4.6 Multi-channel: to matrix or not to matrix 2.5 Applying the basic techniques: real coding systems 2.5.1 Pointers to early systems (no detailed description) 2.5.2 MPEG Audio 2.5.3 MPEG-2 Advanced Audio Coding (MPEG-2 AAC) 2.5.4 MPEG-4 Audio 2.6 Current Research Topics 2.7 Conclusions 3 Reverberation Algorithms William G. Gardner 3.1 Introduction 3.1.1 Reverberation as a linear filter 3.1.2 Approaches to reverberation algorithms 3.2 Physical and Perceptual Background 3.2.1 Measurement of reverberation 3.2.2 Early reverberation 3.2.3 Perceptual effects of early echoes 3.2.4 Reverberation time 3.2.5 Modal description of reverberation 3.2.6 Statistical model for reverberation 3.2.7 Subjective and objective measures of late reverberation 3.2.8 Summary of framework 3.3 Modeling Early Reverberation 3.4 Comb and Allpass Reverberators 3.4.1 Schroeder’s reverberator 3.4.2 The parallel comb filter 3.4.3 Modal density and echo density 3.4.4 Producing uncorrelated outputs 3.4.5 Moorer’s reverberator 3.4.6 Allpass reverberators 3.5 Feedback Delay Networks 3.5.1 Jot’s reverberator 119 3.5.2 Unitary feedback loops 121 3.5.3 Absorptive delays 122 3.5.4 Waveguide reverberators 123 3.5.5 Lossless prototype structures 125 3.5.6 Implementation of absorptive and correction filters 128 3.5.7 Multirate algorithms 128 3.5.8 Time-varying algorithms 129 3.6 Conclusions 130 4 Digital Audio Restoration Simon Godsill, Peter Rayner and Olivier Cappé 4.1 Introduction 4.2 Modelling of audio signals 4.3 Click Removal 4.3.1 Modelling of clicks 4.3.2 Detection 4.3.3 Replacement of corrupted samples 4.3.4 Statistical methods for the treatment of clicks 4.4 Correlated Noise Pulse Removal 4.5 Background noise reduction 4.5.1 Background noise reduction by short-time spectral attenuation 164 4.5.2 Discussion 177 4.6 Pitch variation defects 177 4.6.1 Frequency domain estimation 179 4.7 Reduction of Nonlinear Amplitude Distortion 182 4.7.1 Distortion Modelling 183 4.7.2 Nonlinear Signal Models 184 4.7.3 Application of Nonlinear models to Distortion Reduction 186 4.7.4 Parameter Estimation 188 4.7.5 Examples 190 4.7.6 Discussion 190 4.8 Other areas 192 4.9 Conclusion and Future Trends 193 5 Digital Audio System Architecture Mark Kahrs 5.1 Introduction 5.2 Input/Output 5.2.1 Analog/Digital Conversion 5.2.2 Sampling clocks 5.3 Processing 5.3.1 Requirements 5.3.2 Processing 5.3.3 Synthesis 5.3.4 Processors 5.4 Conclusion 6 Signal Processing for Hearing Aids James M. Kates 6.1 Introduction 6.2 Hearing and Hearing Loss 6.2.1 Outer and Middle Ear 6.3 Inner Ear 6.3.1 Retrocochlear and Central Losses 6.3.2 Summary 6.4 Linear Amplification 6.4.1 System Description 6.4.2 Dynamic Range 6.4.3 Distortion 6.4.4 Bandwidth 6.5 Feedback Cancellation 6.6 Compression Amplification 6.6.1 Single-Channel Compression 6.6.2 Two-Channel Compression 6.6.3 Multi-Channel Compression 6.7 Single-Microphone Noise Suppression 6.7.Adaptive Analog Filters 6.7.2 Spectral Subtraction 6.7.3 Spectral Enhancement 6.8 Multi-Microphone Noise Suppression 6.8.1 Directional Microphone Elements 6.8.2 Two-Microphone Adaptive Noise Cancellation 6.8.3 Arrays with Time-Invariant Weights 6.8.4 Two-Microphone Adaptive Arrays 6.8.5 Multi-Microphone Adaptive Arrays 6.8.6 Performance Comparison in a Real Room 6.9 Cochlear Implants 6.10 Conclusions 7 Time and Pitch scale modification of audio signals Jean Laroche 7.1 Introduction 7.2 Notations and definitions 7.2.1 An underlying sinusoidal model for signals 7.2.2 A definition of time-scale and pitch-scale modification 7.3 Frequency-domain techniques 7.3.1 Methods based on the short-time Fourier transform 7.3.2 Methods based on a signal model 7.4 Time-domain techniques 7.4.1 Principle 7.4.2 Pitch independent methods 7.4.3 Periodicity-driven methods 7.5 Formant modification 7.5.1 Time-domain techniques 7.5.2 Frequency-domain techniques 7.6 Discussion 7.6.1 Generic problems associated with time or pitch scaling 7.6.2 Time-domain vs frequency-domain techniques 8 Wavetable Sampling Synthesis Dana C. Massie 8.1 Background and introduction 8.1.1 Transition to Digital 8.1.2 Flourishing of Digital Synthesis Methods 8.1.3 Metrics: The Sampling - Synthesis Continuum 8.1.4 Sampling vs. Synthesis 8.2 Wavetable Sampling Synthesis 8.2.1 Playback of digitized musical instrument events. 8.2.2 Entire note - not single period 8.2.3 Pitch Shifting Technologies 8.2.4 Looping of sustain 8.2.5 Multi-sampling 8.2.6 Enveloping 8.2.7 Filtering 8.2.8 Amplitude variations as a function of velocity 8.2.9 Mixing or summation of channels 8.2.10 Multiplexed wavetables 8.3 Conclusion 9 Audio Signal Processing Based on Sinusoidal Analysis/Synthesis T.F. Quatieri and R. J. McAulay 9.1 Introduction 9.2 Filter Bank Analysis/Synthesis 9.2.1 Additive Synthesis 9.2.2 Phase Vocoder 9.2.3 Motivation for a Sine-Wave Analysis/Synthesis 9.3 Sinusoidal-Based Analysis/Synthesis 9.3.1 Model 9.3.2 Estimation of Model Parameters 9.3.3 Frame-to-Frame Peak Matching 9.3.4 Synthesis 9.3.5 Experimental Results 9.3.6 Applications of the Baseline System 9.3.7 Time-Frequency Resolution 9.4 Source/Filter Phase Model 9.4.1 Model 367 9.4.2 Phase Coherence in Signal Modification 368 9.4.3 Revisiting the Filter Bank-Based Approach 381 9.5 Additive Deterministic/Stochastic Model 384 9.5.1 Model 385 9.5.2 Analysis/Synthesis 387 9.5.3 Applications 390 9.6 Signal Separation Using a Two-Voice Model 392 9.6.1 Formulation of the Separation Problem 392 9.6.2 Analysis and Separation 396 9.6.3 The Ambiguity Problem 399 9.6.4 Pitch and Voicing Estimation 402 9.7 FM Synthesis 403 9.7.1 Principles 404 9.7.2 Representation of Musical Sound 407 9.7.3 Parameter Estimation 409 9.7.4 Extensions 411 9.8 Conclusions 411 10 Principles of Digital Waveguide Models of Musical Instruments 417 Julius O. Smith III 10.1 Introduction 418 10.1.1 Antecedents in Speech Modeling 418 10.1.2 Physical Models in Music Synthesis 420 10.1.3 Summary 422 10.2 The Ideal Vibrating String 423 10.2.1 The Finite Difference Approximation 424 10.2.2 Traveling-Wave Solution 426 10.3 Sampling the Traveling Waves 426 10.3.1 Relation to Finite Difference Recursion 430 10.4 Alternative Wave Variables 431 10.4.1 Spatial Derivatives 431 10.4.2 Force Waves 432 10.4.3 Power Waves 434 10.4.4 Energy Density Waves 435 10.4.5 Root-Power Waves 436 10.5 Scattering at an Impedance Discontinuity 436 10.5.1 The Kelly-Lochbaum and One-Multiply Scattering Junctions 439 10.5.2 Normalized Scattering Junctions 441 10.5.3 Junction Passivity 443 10.6 Scattering at a Loaded Junction of N Waveguides 446 10.7 The Lossy One-Dimensional Wave Equation 448 10.7.1 Loss Consolidation 450 10.7.2 Frequency-Dependent Losses 451 10.8 The Dispersive One-Dimensional Wave Equation 451 10.9 Single-Reed Instruments 455Contents xi 10.9.1 Clarinet Overview 457 10.9.2 Single-Reed Theory 458 10.10 Bowed Strings 462 10.10.1 Violin Overview 463 10.10.2 The Bow-String Scattering Junction 464 10.11 Conclusions 466 References 467 Index 535
- Спойлер:
Applications of Digital Signal Processing to Audio and Acoustics
|
|
| | | | Синтез речи и обработка звука | |
| Синтез речи и обработка звука |
---|
| |
|