语音合成技术和文本语音转换 - Synthetic voice and Text to Speech technology - Синтетический голос и технологии преобразования текста в речь |
| | |
Автор | Сообщение | Компьютерная лингвистика |
---|
bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Вт Сен 08 2009, 23:47 | Вт Сен 08 2009, 23:47 | |
| Introduction to Information Retrieval Christopher D. Manning
- Спойлер:
List of Tables xv List of Figures xvii Table of Notations xxv Preface xxvii
1 Information retrieval using the Boolean model 1 1.1 An example information retrieval problem 2 1.2 A first take at building an inverted index 5 1.3 Processing Boolean queries 9 1.4 Boolean querying, extended Boolean querying, and ranked retrieval 11 1.5 References and further reading 13 1.6 Exercises 14
2 The dictionary and postings lists 17 2.1 Document delineation and character sequence decoding 17 2.1.1 Obtaining the character sequence in a document 17 2.1.2 Choosing a document unit 18 2.2 Determining dictionary terms 20 2.2.1 Tokenization 20 2.2.2 Dropping common terms: stop words 23 2.2.3 Normalization (equivalence classing of terms) 24 2.2.4 Stemming and lemmatization 28 2.3 Postings lists, revisited 31 2.3.1 Faster postings merges: Skip pointers 31 2.3.2 Phrase queries 32 2.4 References and further reading 36 2.5 Exercises 37Preliminary draft (c)
3 Tolerant retrieval 39 3.1 Wildcard queries 39 3.1.1 General wildcard queries 40 3.1.2 k-gram indexes 42 3.2 Spelling correction 43 3.2.1 Implementing spelling correction 43 3.2.2 Forms of spell correction 44 3.2.3 Edit distance 44 3.2.4 k-gram indexes 46 3.2.5 Context sensitive spelling correction 47 3.3 Phonetic correction 48 3.4 References and further reading 50
4 Index construction 51 4.1 Construction of large indexes 51 4.2 Distributed indexing 55 4.3 Dynamic indexing 58 4.4 Other types of indexes 59 4.5 References and further reading 61 4.6 Exercises 62
5 Index compression 65 5.1 Statistical properties of terms in information retrieval 65 5.2 Dictionary compression 68 5.2.1 Dictionary-as-a-string 69 5.2.2 Blocked storage 70 5.3 Postings file compression 73 5.3.1 Variable byte codes 74 5.3.2 γ codes 75 5.4 References and further reading 82 5.5 Exercises 83
6 Scoring and term weighting 87 6.1 Parametric and zone indexes 87 6.1.1 Weighted zone scoring 89 6.2 Term frequency and weighting 89 6.2.1 Inverse document frequency 90 6.2.2 Tf-idf weighting 92 6.3 Variants in weighting functions 92 6.3.1 Sublinear tf scaling 93 6.3.2 Maximum tf normalization 93 6.3.3 The effect of document length 94 6.3.4 Learning weight functions 94Preliminary draft (c) 6.3.5 Query-term proximity 96
7 Vector space retrieval 99 7.1 Documents as vectors 99 7.1.1 Inner products 99 7.1.2 Queries as vectors 102 7.1.3 Pivoted normalized document length 103 7.2 Heuristics for efficient scoring and ranking 105 7.2.1 Inexact top K document retrieval 106 7.3 Interaction between vector space and other retrieval methods 109 7.3.1 Query parsing and composite scoring 111 7.4 References and further reading 112
8 Evaluation in information retrieval 113 8.1 Evaluating information retrieval systems and search engines 114 8.2 Standard test collections 115 8.3 Evaluation of unranked retrieval sets 116 8.4 Evaluation of ranked retrieval results 119 8.5 Assessing relevance 123 8.5.1 Document relevance: critiques and justifications of the concept 124 8.5.2 Evaluation heuristics 126 8.6 A broader perspective: System quality and user utility 126 8.6.1 System issues 126 8.6.2 User utility 127 8.7 Results snippets 128 8.8 References and further reading 131
9 Relevance feedback and query expansion 135 9.1 Relevance feedback and pseudo-relevance feedback 136 9.1.1 The Rocchio Algorithm for relevance feedback 136 9.1.2 Probabilistic relevance feedback 141 9.1.3 When does relevance feedback work? 142 9.1.4 Relevance Feedback on the Web 143 9.1.5 Evaluation of relevance feedback strategies 144 9.1.6 Pseudo-relevance feedback 144 9.1.7 Indirect relevance feedback 145 9.1.8 Summary 146 9.2 Global methods for query reformulation 146 9.2.1 Vocabulary tools for query reformulation 146 9.2.2 Query expansion 146Preliminary draft (c) 9.2.3 Automatic thesaurus generation 148 9.3 References and further reading 150
10 XML retrieval 151 10.1 Basic XML concepts 152 10.2 Challenges in semistructured retrieval 154 10.3 A vector space model for XML retrieval 157 10.4 Evaluation of XML Retrieval 161 10.5 Text-centric vs. structure-centric XML retrieval 164 10.6 References and further reading 166 10.7 Exercises 166
11 Probabilistic information retrieval 167 11.1 Probability in Information Retrieval 167 11.2 The Probability Ranking Principle 168 11.3 The Binary Independence Model 170 11.3.1 Deriving a ranking function for query terms 171 11.3.2 Probability estimates in theory 172 11.3.3 Probability estimates in practice 173 11.3.4 Probabilistic approaches to relevance feedback 174 11.3.5 PRP and BIM 175 11.4 An appraisal and some extensions 177 11.4.1 Okapi BM25: a non-binary model 177 11.4.2 Bayesian network approaches to IR 178 11.5 References and further reading 179 11.6 Exercises 180 11.6.1 Okapi weighting 180
12 Language models for information retrieval 181 12.1 The Query Likelihood Model 184 12.1.1 Using Query Likelihood Language Models in IR 184 12.1.2 Estimating the query generation probability 185 12.2 Ponte and Croft’s Experiments 187 12.3 Language modeling versus other approaches in IR 187 12.4 Extended language modeling approaches 189 12.5 References and further reading 191
13 Text classification and Naive Bayes 193 13.1 The text classification problem 195 13.2 Naive Bayes text classification 199 13.3 The multinomial versus the binomial model 205 13.4 Properties of Naive Bayes 207 13.5 Feature selection 208 13.5.1 Mutual information 209Preliminary draft (c) 13.5.2 χ2 feature selection 211 13.5.3 Frequency-based feature selection 214 13.5.4 Comparison of feature selection methods 214 13.6 Evaluation of text classification 215 13.7 References and further reading 219 13.8 Exercises 220
14 Vector space classification 223 14.1 Rocchio classification 225 14.2 k nearest neighbor 229 14.3 Linear vs. nonlinear classifiers and the bias-variance tradeoff 233 14.3.1 Linear and nonlinear classifiers 233 14.3.2 The bias-variance tradeoff 237 14.4 More than two classes 241 14.5 References and further reading 244 14.6 Exercises 244
15 Support vector machines and kernel functions 247 15.1 Support vector machines: The linearly separable case 247 15.2 Soft margin classification 253 15.3 Nonlinear SVMs 254 15.4 Experimental data 257 15.5 Issues in the categorization of text documents 259 15.6 References and further reading 259
16 Flat clustering 261 16.1 Clustering in information retrieval 262 16.2 Problem statement 266 16.3 Evaluation of clustering 267 16.4 K-means 270 16.4.1 Cluster cardinality in k-means 274 16.5 Model-based clustering 276 16.6 References and further reading 280 16.7 Exercises 281
17 Hierarchical clustering 285 17.1 Hierarchical agglomerative clustering 286 17.2 Single-link and complete-link clustering 289 17.2.1 Time complexity 293 17.3 Group-average agglomerative clustering 296 17.4 Centroid clustering 297 17.5 Cluster labeling 299 17.6 Variants 301Preliminary draft (c) 17.7 Implementation notes 302 17.8 References and further reading 303 17.9 Exercises 304
18 Matrix decompositions and Latent Semantic Indexing 307 18.1 Linear algebra review 307 18.1.1 Matrix decompositions 310 18.2 Term-document matrices and singular value decompositions 311 18.3 Low-rank approximations and latent semantic indexing 312 18.4 References and further reading 317
19 Web search basics 319 19.1 Background and history 319 19.2 Web characteristics 321 19.2.1 The web graph 322 19.2.2 Spam 324 19.3 Advertising as the economic model 325 19.4 The search user experience 327 19.4.1 User query needs 328 19.5 Index size and estimation 329 19.6 Near-duplicates and shingling 332 19.6.1 Shingling 333 19.7 References and further reading 336
20 Web crawling and indexes 339 20.1 Overview 339 20.1.1 Features a crawler must provide 339 20.1.2 Features a crawler should provide 340 20.2 Crawling 340 20.2.1 Crawler architecture 341 20.2.2 DNS resolution 344 20.2.3 The URL frontier 346 20.3 Distributing indexes 349 20.4 Connectivity servers 350 20.5 References and further reading 352
21 Link analysis 355 21.1 The web as a graph 355 21.1.1 Anchor text and the web graph 356 21.2 Pagerank 357 21.2.1 Markov chains 359 21.2.2 The Pagerank computation 361 21.2.3 Topic-specific Pagerank 364Preliminary draft (c) 21.3 Hubs and Authorities 366 21.3.1 Choosing the subset of the web 369 21.4 References and further reading 370
Bibliography 373 Index 395
- Спойлер:
An Introduction to Information Retrieval (Cambridge University Press)
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Ср Сен 09 2009, 00:04 | Ср Сен 09 2009, 00:04 | |
| Linguistic Knowledge and Word Sense Disambiguation Tanja Gaustad
Contents
- Спойлер:
Acknowledgements v List of Tables xi List of Figures xv
1 Introduction 1 1.1 Ambiguity in Language . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Word Sense Disambiguation 7 2.1 De?ning Word Senses . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Knowledge-Based Approaches . . . . . . . . . . . . . . 11 2.2.2 Corpus-Based Approaches . . . . . . . . . . . . . . . . 15 2.3 Information Sources . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 PoS Information . . . . . . . . . . . . . . . . . . . . . 21 2.3.2 Syntactic Structure . . . . . . . . . . . . . . . . . . . . 22 2.3.3 Selectional Preferences . . . . . . . . . . . . . . . . . . 23 2.3.4 Combination of Information Sources . . . . . . . . . . 24 2.4 Problem of Evaluation . . . . . . . . . . . . . . . . . . . . . . 27 2.4.1 Senseval: A Common Evaluation Framework . . . . . 29 2.5 General Approach . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Initial Experiments: Pseudowords 33 3.1 Pseudowords . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Naive Bayes Classi?cation . . . . . . . . . . . . . . . . . . . . 34 3.3 Varying Corpus Size . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.1 Corpus and Pseudowords . . . . . . . . . . . . . . . . . 36 3.3.2 Underlying Assumptions . . . . . . . . . . . . . . . . . 36 3.3.3 Results and Evaluation . . . . . . . . . . . . . . . . . . 37 3.4 Varying Thresholds for Context Words . . . . . . . . . . . . . 38 3.4.1 Results and Evaluation . . . . . . . . . . . . . . . . . . 39 3.5 Pseudowords versus Real Ambiguous Words . . . . . . . . . . 39 3.5.1 Outline of the Problem . . . . . . . . . . . . . . . . . . 39 3.5.2 Way of Proceeding . . . . . . . . . . . . . . . . . . . . 40 3.5.3 Corpus and Ambiguous Words/Pseudowords . . . . . . 41 3.5.4 Results and Evaluation . . . . . . . . . . . . . . . . . . 43
4 Experimental Setup 47 4.1 Senseval-2 Corpus for Dutch . . . . . . . . . . . . . . . . . . 48 4.2 WSD as Classi?cation Problem . . . . . . . . . . . . . . . . . 50 4.2.1 Maximum Entropy Classi?cation . . . . . . . . . . . . 50 4.2.2 Smoothing with Gaussian Priors . . . . . . . . . . . . . 51 4.3 Building Individual Classi?ers . . . . . . . . . . . . . . . . . . 52 4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5 Tuning versus Testing . . . . . . . . . . . . . . . . . . . . . . 57 4.6 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . 59
5 Lemma-Based Approach 65 5.1 Accurate Stemming of Dutch . . . . . . . . . . . . . . . . . . 66 5.2 Stemmers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2.1 Dutch Porter Stemmer . . . . . . . . . . . . . . . . . . 69 5.2.2 Stemmer with Dictionary Lookup . . . . . . . . . . . . 69 5.2.3 Stand-Alone Evaluation . . . . . . . . . . . . . . . . . 70 5.3 Dictionary-Based Lemmatizer for Dutch . . . . . . . . . . . . 72 5.4 Introducing the Lemma-Based Approach . . . . . . . . . . . . 73 5.5 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . 74
6 Impact of Part-of-Speech Information 79 6.1 Application-Oriented Evaluation of Three PoS Taggers . . . . 80 6.2 Comparison of PoS Taggers . . . . . . . . . . . . . . . . . . . 80 6.2.1 Hidden Markov Model PoS Tagger . . . . . . . . . . . 81 6.2.2 Memory-Based PoS Tagger . . . . . . . . . . . . . . . . 82 6.2.3 Transformation-Based PoS Tagger . . . . . . . . . . . . 83 6.2.4 Stand-Alone Results for the PoS Taggers . . . . . . . . 84 6.3 Integrating PoS Information . . . . . . . . . . . . . . . . . . . 84 6.4 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . 85 6.5 PoS Information in Context . . . . . . . . . . . . . . . . . . . 88
7 Impact of Structural Syntactic Information 91 7.1 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.2 Dependency Relations . . . . . . . . . . . . . . . . . . . . . . 96 7.2.1 Alpino Dependency Parser . . . . . . . . . . . . . . . . 97 7.2.2 Dependency Triples as Features . . . . . . . . . . . . . 98 7.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . 100
8 Final Results on Dutch Senseval-2 Test Data 105 8.1 Summary of Findings on Tuning Data . . . . . . . . . . . . . 106 8.2 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . 108
9 Conclusions and Future Work 111 9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 9.2.1 Semantic Information . . . . . . . . . . . . . . . . . . . 112 9.2.2 EuroWordNet to Acquire More Data . . . . . . . . . . 113 9.2.3 Other Languages . . . . . . . . . . . . . . . . . . . . . 115 9.2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . 115
Bibliography 117 Summary 135 Samenvatting 139
- Спойлер:
Linguistic Knowledge and Word Sense Disambiguation
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Ср Сен 09 2009, 00:11 | Ср Сен 09 2009, 00:11 | |
| A Computational Model Of Natural Language Communication: Interpretation, Inference, And Production In Database Semantics Roland R. Hausser
Everyday life would be easier if we could simply talk with machines instead of having to program them. Before such talking robots can be built, however, there must be a theory of how communicating with natural language works. This requires not only a grammatical analysis of the language signs, but also a model of the cognitive agent, with interfaces for recognition and action, an internal database, and an algorithm for reading content in and out. In Database Semantics, these ingredients are used for reconstructing natural language communication as a mechanism for transferring content from the database of the speaker to the database of the hearer.
Part I of this book presents a high-level description of an artificial agent which humans can freely communicate with in their accustomed language. Part II analyzes the major constructions of natural language, i.e., intra- and extrapropositional functor - argument structure, coordination, and coreference, in the speaker and the hearer mode. Part III defines declarative specifications for fragments of English, which are used for an implementation in Java.
The book provides researchers, graduate students and software engineers with a functional framework for the theoretical analysis of natural language communication and for all practical applications of natural language processing.
Contents
- Спойлер:
Introduction ..................................................... 1
Part I. The Communication Mechanism of Cognition 1. Matters of Method ............................................ 9 1.1 Sign- or Agent-Oriented Analysis of Language? . .................. 9 1.2 VerificationPrinciple ......................................... 11 1.3 EquationPrinciple ........................................... 13 1.4 ObjectivationPrinciple........................................ 14 1.5 Equivalence Principles for Interfaces and for Input/Output ......... 16 1.6 Surface Compositionality and Time-Linearity . .................... 17 2. Interfaces and Components .................................... 21 2.1 Cognitive Agents with and without Language . .................... 21 2.2 Modalities and Media . ........................................ 23 2.3 Alternative Ontologies for Referring with Language . . . ............ 25 2.4 Theory of Language and Theory of Grammar ..................... 26 2.5 ImmediateReferenceandMediatedReference.................... 27 2.6 The SLIM Theory of Language ................................. 29 3. Data Structure and Algorithm .................................. 35 3.1 Proplets for Coding Propositional Content . ....................... 35 3.2 Internal Matching between Language and Context Proplets ......... 36 3.3 StorageofPropletsinaWordBank ............................. 38 3.4 Time-Linear Algorithm of LA-Grammar ......................... 40 3.5 Cycle of Natural Language Communication ...................... 43 3.6 Bare Bone Example of Database Semantics: DBS-letter ............ 46 4. Concept Types and Concept Tokens ............................. 51 4.1 KindsofProplets ............................................ 51 4.2 Type–TokenRelationforEstablishingReference .................. 54 4.3 Context Recognition . . ........................................ 57 4.4 ContextAction .............................................. 58 4.5 Sign Recognition and Production ............................... 59 4.6 Universal versus Language-Dependent Properties ................. 61 5. Forms of Thinking ............................................ 65 5.1 Retrieving Answers to Questions ............................... 65 5.2 Episodic versus Absolute Propositions . .......................... 69 5.3 Inference: Reconstructing Modus Ponens ........................ 71 5.4 Indirect Uses of Language ..................................... 75 5.5 Secondary Coding as Perspective Taking ........................ 78 5.6 ShadesofMeaning........................................... 79
Part II. The Major Constructions of Natural Language 6. Intrapropositional Functor–Argument Structure .................. 87 6.1 Overview................................................... 87 6.2 Determiners................................................. 89 6.3 Adjectives .................................................. 94 6.4 Auxiliaries .................................................. 97 6.5 Passive ..................................................... 98 6.6 Prepositions................................................. 100 7. Extrapropositional Functor–Argument Structure .................. 103 7.1 Overview................................................... 103 7.2 SententialArgumentasSubject................................. 105 7.3 SententialArgumentasObject ................................. 107 7.4 Adnominal Sentential Modifier with Subject Gap . ................. 108 7.5 Adnominal Sentential Modifier with Object Gap .................. 111 7.6 Adverbial Sentential Modifier .................................. 112 8. Intrapropositional Coordination ................................ 115 8.1 Overview................................................... 115 8.2 Simple Coordination of Nouns in Subject and Object Position ....... 118 8.3 SimpleCoordinationofVerbsandofAdjectives................... 123 8.4 ComplexCoordinationofVerbsandObjects:SubjectGapping ...... 126 8.5 ComplexCoordinationofSubjectsandObjects:VerbGapping ...... 130 8.6 ComplexCoordinationofSubjectsandVerbs:ObjectGapping ...... 133 9. Extrapropositional Coordination ................................ 137 9.1 Overview................................................... 137 9.2 Interpretation and Production of Extrapropositional Coordination .... 138 9.3 Simple Coordinations as Sentential Arguments and Modifiers ....... 141 9.4 Complex Coordinations as Sentential Arguments and Modifiers ..... 147 9.5 Turn-TakinginQuestionsandAnswers .......................... 153 9.6 Complex Propositions as Thought Structures ..................... 157 10. Intrapropositional and Extrapropositional Coreference ............. 161 10.1Overview................................................... 161 10.2 Intrapropositional Coreference ................................. 163 10.3Langacker–RossConstraintforSententialArguments .............. 165 10.4 Langacker–Ross Constraint for Adnominal Sentential Modifiers ..... 168 10.5 Langacker–Ross Constraint for Adverbial Sentential Modifiers ...... 171 10.6 Handling Pronominal Coreference by Means of Inference .......... 174
Part III. The Declarative Specification of Formal Fragments 11. DBS.1: Hearer Mode .......................................... 183 11.1 Automatic Word Form Recognition . . . .......................... 183 11.2LexiconofLA-hear.1......................................... 185 11.3PreambleofLA-hear.1........................................ 187 11.4 Definition of LA-hear.1 ....................................... 188 11.5InterpretingaSequenceofSentences............................ 191 11.6StoringtheOutputofLA-hear.1inaWordBank .................. 195 12. DBS.1: Speaker Mode ......................................... 197 12.1 Definition of LA-think.1 ...................................... 197 12.2NavigatingwithLA-think.1.................................... 199 12.3 Automatic Word Form Production .............................. 202 12.4 Definition of LA-speak.1 ...................................... 203 12.5 Producing a Sequence of Sentences . . . .......................... 204 12.6 Summarizing the DBS.1 System................................ 207 13. DBS.2: Hearer Mode .......................................... 209 13.1LexiconofLA-hear.2......................................... 209 13.2 Preamble and Definition of LA-hear.2 . .......................... 216 13.3 Interpreting a Sentence with Complex Noun Phrases . . . ............ 220 13.4InterpretingaSentencewithaComplexVerbPhrase............... 226 13.5InterpretingaSentencewithaThree-PlaceVerb................... 229 13.6StoringtheOutputofLA-hear.2inaWordBank .................. 234 14. DBS.2: Speaker Mode ......................................... 237 14.1 Definition of LA-think.2 ...................................... 237 14.2 Definition of LA-speak.2 ...................................... 240 14.3 Automatic Word Form Production .............................. 243 14.4 Producing a Sentence with Complex Noun Phrases ................ 249 14.5 Producing a Sentence with a Complex Verb Phrase ................ 254 14.6 Producing a Sentence with a Three-Place Verb .................... 258 15. DBS.3: Adnominal and Adverbial Modifiers ...................... 263 15.1 Interpreting Elementary and Complex Modifiers .................. 263 15.2 ADN and ADA Interpretations of Prepositional Phrases ............ 272 15.3 ADV Interpretation of Prepositional Phrases . ..................... 277 15.4 Intensifiers in Noun Phrases and Prepositional Phrases . ............ 282 15.5 Elementary Adverbs with Intensifiers . . .......................... 288 15.6 Definition of LA-hear.3 ....................................... 291
Appendices A. Universal Basis of Word Order Variation ......................... 303 A.1 Overview of the Basic Railroad System .......................... 303 A.2 Incremental Language Production Based on Navigation ............ 307 A.3 Realizing Alternative Word Orders from One-Place Propositions . .... 310 A.4 Realizing Basic SO Word Orders from Two-Place Propositions ...... 312 A.5 RealizingOSWordOrdersfromAlternativeNavigations ........... 316 A.6 Realizing Basic Word Orders from Three-Place Propositions ........ 318 B. Declarative Description of the Motor Procedure ................... 321 B.1 StartStateApplication ........................................ 321 B.2 Matching between Proplet Patterns and Language Proplets . ......... 324 B.3 Time-LinearBreadth-FirstDerivationOrder...................... 326 B.4 RuleApplicationandtheBasicStructureoftheLA-HearMotor ..... 327 B.5 Operations .................................................. 330 B.6 Basic Structure of the LA-Think and the LA-Think–Speak Motor .... 332 C. Glossary .................................................... 335 C.1 Proplet Attributes ............................................ 335 C.2 PropletValues ............................................... 335 C.3 Variables, Restrictions, and Agreement Conditions ................ 337 C.4 Abstract Surfaces ............................................ 339 C.5 RuleNames................................................. 339 C.6 ListofAnalyzedExamples .................................... 341
Bibliography .................................................... 347 Name Index ..................................................... 357 Subject Index .................................................... 361
- Спойлер:
Hausser, R. - A Computational Model Of Natural Language Communication Interpretation, Inference and Production in Database Semantics
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Ср Сен 09 2009, 00:25 | Ср Сен 09 2009, 00:25 | |
| The Text Mining Handbook: Advanced Approaches In Analyzing Unstructured Data Ronen Feldman, James Sanger
Text mining tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, this book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches. Finally, it explores current real-world, mission-critical applications of text mining and link detection in such varied fields as M&A business intelligence, genomics research and counter-terrorism activities.
Contents
- Спойлер:
Preface page x
I. Introduction to Text Mining 1 I.1 Defining Text Mining 1 I.2 General Architecture of Text Mining Systems 13
II. Core Text Mining Operations 19 II.1 Core Text Mining Operations 19 II.2 Using Background Knowledge for Text Mining 41 II.3 Text Mining Query Languages 51
III. Text Mining Preprocessing Techniques 57 III.1 Task-Oriented Approaches 58 III.2 Further Reading 62
IV. Categorization 64 IV.1 Applications of Text Categorization 65 IV.2 Definition of the Problem 66 IV.3 Document Representation 68 IV.4 Knowledge Engineering Approach to TC 70 IV.5 Machine Learning Approach to TC 70 IV.6 Using Unlabeled Data to Improve Classification 78 IV.7 Evaluation of Text Classifiers 79 IV.8 Citations and Notes 80
V. Clustering 82 V.1 Clustering Tasks in Text Analysis 82 V.2 The General Clustering Problem 84 V.3 Clustering Algorithms 85 V.4 Clustering of Textual Data 88 V.5 Citations and Notes 92
VI. Information Extraction 94 VI.1 Introduction to Information Extraction 94 VI.2 Historical Evolution of IE: The Message Understanding Conferences and Tipster 96 VI.3 IE Examples 101 VI.4 Architecture of IE Systems 104 VI.5 Anaphora Resolution 109 VI.6 Inductive Algorithms for IE 119 VI.7 Structural IE 122 VI.8 Further Reading 129
VII. Probabilistic Models for Information Extraction 131 VII.1 Hidden Markov Models 131 VII.2 Stochastic Context-Free Grammars 137 VII.3 Maximal Entropy Modeling 138 VII.4 Maximal Entropy Markov Models 140 VII.5 Conditional Random Fields 142 VII.6 Further Reading 145
VIII. Preprocessing Applications Using Probabilistic and Hybrid Approaches 146 VIII.1 Applications of HMM to Textual Analysis 146 VIII.2 Using MEMM for Information Extraction 152 VIII.3 Applications of CRFs to Textual Analysis 153 VIII.4 TEG: Using SCFG Rules for Hybrid Statistical–Knowledge-Based IE 155 VIII.5 Bootstrapping 166 VIII.6 Further Reading 175
IX. Presentation-Layer Considerations for Browsing and Query Refinement 177 IX.1 Browsing 177 IX.2 Accessing Constraints and Simple Specification Filters at the Presentation Layer 185 IX.3 Accessing the Underlying Query Language 186 IX.4 Citations and Notes 187
X. Visualization Approaches 189 X.1 Introduction 189 X.2 Architectural Considerations 192 X.3 Common Visualization Approaches for Text Mining 194 X.4 Visualization Techniques in Link Analysis 225 X.5 Real-World Example: The Document Explorer System 235
XI. Link Analysis 244 XI.1 Preliminaries 244P1: JZZ XI.2 Automatic Layout of Networks 246 XI.3 Paths and Cycles in Graphs 250 XI.4 Centrality 251 XI.5 Partitioning of Networks 259 XI.6 Pattern Matching in Networks 272 XI.7 Software Packages for Link Analysis 273 XI.8 Citations and Notes 274
XII. Text Mining Applications 275 XII.1 General Considerations 276 XII.2 Corporate Finance: Mining Industry Literature for Business Intelligence 281 XII.3 A “Horizontal” Text Mining Application: Patent Analysis Solution Leveraging a Commercial Text Analytics Platform 297 XII.4 Life Sciences Research: Mining Biological Pathway Information with GeneWays 309
Appendix A: DIAL: A Dedicated Information Extraction Language for Text Mining 317 A.1 What Is the DIAL Language? 317 A.2 Information Extraction in the DIAL Environment 318 A.3 Text Tokenization 320 A.4 Concept and Rule Structure 320 A.5 Pattern Matching 322 A.6 Pattern Elements 323 A.7 Rule Constraints 327 A.8 Concept Guards 328 A.9 Complete DIAL Examples 329
Bibliography 337 Index 391
- Спойлер:
The Text Mining Handbook: Advanced Approaches In Analyzing Unstructured Data
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Чт Сен 17 2009, 00:28 | Чт Сен 17 2009, 00:28 | |
| The Integration of Phonetic Knowledge in Speech Technology (Text, Speech and Language Technology) William J. Barry, Wim A. van Dommelen
Description:
Continued progress in Speech Technology in the face of ever-increasing demands on the performance levels of applications is a challenge to the whole speech and language science community. Robust recognition and understanding of spontaneous speech in varied environments, good comprehensibility and naturalness of expressive speech synthesis are goals that cannot be achieved without a change of paradigm. This book argues for interdisciplinary communication and cooperation in problem-solving in general, and discusses the interaction between speech and language engineering and phonetics in particular. With a number of reports on innovative speech technology research as well as more theoretical discussions, it addresses the practical, scientific and sometimes the philosophical problems that stand in the way of cross-disciplinary collaboration and illuminates some of the many possible ways forward.
Audience: Researchers and professionals in speech technology and computational linguists. TABLE OF CONTENTS
Foreword
Phonetic Knowledge in Speech Technology – and Phonetic Knowledge from Speech Technology Can Phonetic Knowledge be Used to Improve the Performance of Speech Recognisers and Synthesisers Prosodic Models,Automatic Speech Understanding, andSpeechSynthesis: Towards the Common Ground Phonetic TimeMaps Introducing Phonetically Motivated, Heterogeneous Information into Automatic Speech Recognition Introducing Contextual Transcription Rules in Large Vocabulary Speech Recognition From Here to Utility Pronunciation Modeling Phonetic Knowledge in Text-to-Speech Synthesis Is Phonetic Knowledge of Any Use for Speech Technology?
- Спойлер:
The Integration of Phonetic Knowledge in Speech Technology (Text, Speech and Language Technology)
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Пт Сен 25 2009, 19:15 | Пт Сен 25 2009, 19:15 | |
| Computational linguistics: An introduction Grishman R.
- Спойлер:
Introduction 1 What is computational linguistics? 1.1 The objectives of computational linguistics 1.2 Computational and theoretical linguistics 1.3 Computational linguistics as engineering 1.4 The structure of this survey - a tree diagram
2 Syntax analysis 2.1 The role of syntax analysis 2.2 Is syntax analysis necessary? 2.3 Phrase-structure languages 2.3.1 Recursive languages 2.3.2 Regular grammars 2.3.3 Context-free grammar 2.3.4 Context-sensitive grammars 2.4. Early systems: context-free parsers 2.4.1 A small context-free natural language grammar 2.4.2 Parsing algorithms for context-free grammars 2.4.3 Some early systems 2.5. Transformational analyzers: first systems 2.5.1 Transformational grammar 2.5.2 A small transformational grammar 2.5.3 Transformational parsers - an overview 2.5.4 Transformational parsers - some details 2.6. Augmented context-free parsers 2.6.1 Restriction Language 2.6.2 Augmented transition networks 2.6.3 Some history 2.6.4 Some comparisons 2.6.5 PROLOG 2.7. Other phrase-structure grammars 2.7.1 Con tent-sensitive grammar 2.7.2 Unrestricted phrase-structure grammar 2.7.3 Grammar and metagrammar 2.8 Analysing adjuncts 2.9 Analyzing coordinate conjunction 2.10 Parsing with probability and graded acceptability
3. Semantic analysis 3.1. Formal languages for meaning representation 3.1.1 Prepositional logic 3.1.2 Predicate logic 3.1.3 Restricted quantification 3.1.4 Semantic nets 3.1.5 Notions not captured in predicate logic 3.1.6 Choice of predicates 3.2. Translation to logical form 3.2.1 The input to the translation 3.2.2 Historical notes 3.2.3 Quantifier ordering 3.3. Semantic constraints 3.3.1 The nature of the constraints 3.3.2 Sublanguages 3.3.3 Specifying the constraints 3.3.4 Enforcing the constraints 3.4 Conceptual analyzers 3.5. Anaphora resolution 3.5.1 When to do anaphora resolution 3.5.2 Computing discourse entities 3.5.3 Selecting the referent 3.5.4 Other anaphoric noun phrases 3.5.5 Definite noun phrases 3.5.6 Indefinite pronouns and noun phrases 3.6. Analyzing sentence fragments 3.7. Using the logical form
4 Discourse analysis and information structuring 4.1. Text grammar 4.2. Organizing world knowledge I Grouping facts by topic 4.3. Frames 4.4. Analyzing narrative: scripts and plans 4.4.1 Scripts 4.4.2 Plans 4.4.3 MOPs. story points, and plot units 4.5. Information formats 4.6. Analyzing dialog 4.6.1 A mixed initiative system 4.6.2 Planning to say something
5 Language generation 5.1. The poor cousin 5.2. Sentence generation 5.2.1 From logical form to deep structure 5.2.2 From deep structure to sentence 5.3. Text generation 5.3.1 Organizing the text 5.3.2 What's best left unsaid
Exercises Bibliography Name index Subject index
- Спойлер:
Computational linguistics: An introduction
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Пт Сен 25 2009, 19:34 | Пт Сен 25 2009, 19:34 | |
| Mathematical Methods in Linguistics Partee B.H., Meulen A.T., Wall R.E.
- Спойлер:
PART A: SET THEORY CHAPTER 1. BASIC CONCEPTS OF SET THEORY 1.1. The concept of a set 1.2. Specification of sets 1.3- Set-theoretic identity and cardinality 1.4. Subsets 1.5. Power sets 1.6. Union and intersection 1.7. Difference and complement 1.8. Set-theoretic equalities Exercises
CHAPTER 2: RELATIONS AND FUNCTIONS 2.1. Ordered pairs and Cartesian products 2.2. Relations 2.3. Functions 2.4. Composition Exercises
CHAPTER 3: PROPERTIES OF RELATIONS 3.1. Reflexivity, symmetry, transitivity, and connectedness 3.2. Diagrams of relations 3.3. Properties of inverses and complements 3.4 Equivalence relations and partitions 3.5,. Orderings Exercises
CHAPTER 4; INFINITIES 4.1. Equivalent sets and cardinality 4.2. Denumeiability of sets 4.3. Nondenumerable sets 4.4 Infinite vs. unbounded Exercises
APPENDIX A: SET-THEORETIC RECONSTRUCTION OF NUMBER SYSTEMS А.1. The natural numbers A.2. Extension to the set of all integers A.3. Extension to the set of all rational numbers A.4. Extension to the set of all real numbers REVIEW EXERCISES
PART B: LOGIC AND FORMAL SYSTEMS CHAPTER 5; BASIC CONCEPTS OF LOGIC AND FORMAL SYSTEMS 5.1. Formal systems and models 5.2. Natural languages and formal languages 5.3. Syntax and semantics 5.4. About statement logic and predicate logic
CHAPTER 6: STATEMENT LOGIC 6.1. Syntax 6.2. Semantics: Truth values and truth tables 6 2.1. Negation 6.2.2. Conjunction 6.2.3. Disjunction 6 2.4 The Conditional 6.2.5 The Biconditional 6.3. Tautologies, contradictions and contingencies 6.4. Logical equivalence, logical consequence and laws 6.5. Natural deduction 6.5.1. Conditional Proof 6.5.2. Indirect Proof 6.6. Beth Tableaux Exercises
CHAPTER 7: PREDICATE LOGIC 7.1. Syntax 7.2. Semantics 7.3. Quantifier laws and prenex normal form 7.4. Natural deduction 7.5. Beth Tableaux 7.6. Formal and informal proofs 7 7. Informal style in mathematical proofs Exercises
CHAPTER 8: FORMAL SYSTEMS, AXIOMATIZATION, AND MODEL THEORY 8.1. The syntactic side of formal systems 8.1.1. Recursive definitions 8.2. Axiomatic systems and derivations 8.2.1. Extended axiomatic systems 8.3. Semi-Thue systems 8.4. Peano's axioms and proof by induction 8.5. The semantic side of formal systems: model theory 8.5.1. Theories and models 8.5.2. Consistency, completeness, and independence 8.5.3. Isomorphism 8.5.4. An elementary formal system 8.5.5. Axioms for ordering relations 8 5.6. Axioms for string concatenation 8.5.7. Models for Peano's axioms 8.5.8. Axiomatization of set theory 8.6. Axiomatizing logic 8.6.1. An axiomatization of statement logic 8.6.2. Consistency and independence proofs 8.6.3. An axiomatization of predicate logic 8.6.4. About completeness proofs 8.6.5. Decidability 8.6.6. Godel's incompleteness theorems 8.6.7. Higher-order logic Exercises APPENDIX B-I: ALTERNATIVE NOTATIONS AND CONNECTIVES APPENDIX B-II; KLEENE'S THREE-VALUED LOGIC REVIEW EXERCISES
PART C: ALGEBRA CHAPTER 9: BASIC CONCEPTS OF ALGEBRA 9.1. Definition of algebra 9.2 Properties of operations 9.3. Special elements 9.4. Maps and morphisms Exercises
CHAPTER 10: OPERATIONAL STRUCTURES 10.1. Groups 10.2. Subgroups, semigroups and monoids 10.3. Integral domains 10.4. Morphisms Exercises
CHAPTER 11: LATTICES 11.1. Posets, duality and diagrams 11.2. Lattices, semilattices and sublattices 11.3. Morphisms in lattices 11.4. Filters and ideals 11.5. Complemented, distributive and modular lattices Exercises
CHAPTER 12: BOOLEAN AND HEYTING ALGEBRAS 12.1. Boolean algebras 12.2. Models of В А 12.3. Representation by sets 12.4. Heyting algebra 12.5. Kripke semantics Exercises REVIEW EXERCISES
PART D; ENGLISH AS A FORMAL LANGUAGE CHAPTER 13: BASIC CONCEPTS 13.1. Compositionality 13.1.1. A compositional account of statement logic 13.1.2.. A compositional account of predicate logic 13.1.3. Natural language and compositionality 13.2. Lambada-abstraction 13.2.1. Type theory 13.2.2. The syntax and semantics of lambda-abstraction 13.2.3. A sample fragment 13.2.4. The lambda-calculus 13.2.5. Linguistic applications Exercises
CHAPTER 14: GENERALIZED QUANTIFIERS 14.1. Determiners and quantifiers 14.2. Conditions and quantifiers 14.3. Properties of determiners and quantifiers 14.4. Determiners as relations 14.5. Context and quantification Exercises
CHAPTER 15: INTENSIONALITY 15.1. Frege's two problems 15.2. Forms of opacity 15.3. Indices and accessibility relations 15.4. Tense and time 15.5. Indexicality Exercises
PART E: LANGUAGES, GRAMMARS, AND AUTOMATA CHAPTER 16: BASIC CONCEPTS 16.1. Languages, grammars and automata 16.2. Grammars 16.3. Trees 16.3.1. Dominance 16.3.2. Precedence 16.3.3. Labeling 16.4. Grammars and trees 16.5. The Chomsky Hierarchy 16.6. Languages and automata
CHAPTER 17: FINITE AUTOMATA, REGULAR LANGUAGES AND TYPE 3 GRAMMARS 17.1. Finite automata 17.1.1. State diagiams of finite automata 17.1.2. Formal definition of deterministic finite automata 17.1.3. Non-deterministic finite automata 17.1.4. Formal definition of non-deterministic finite automata 17.1.5. Equivalence of deterministic and non-deterministic finite automata 17.2. Regular languages 17.2.1. Pumping Theorem for fid's 17.3. Type 3 grammars and finite automaton languages 17.3.1. Properties of regular languages 17.3.2. Inadequacy of right-linear grammars for natural languages Exercises
CHAPTER 18: PUSHDOWN AUTOMATA, CONTEXT FREE GRAMMARS AND LANGUAGES 18.1. Pushdown automata 18.2. Context fiee grammars and languages 18.3. Pumping Theorem for cfTs 18.4. Closure properties of context free languages 18.5 Decidability questions for context free languages 18.6. Are natural languages context free? Exercises
CHAPTER 19: TURING MACHINES, RECURSIVELY ENUMERABLE LANGUAGES AND TYPE 0 GRAMMARS 19.1.. Turing machines 19.1.1. Formal definitions 19.2. Equivalent formulations of Turing machines 19.3. Unrestricted grammars and Turing machines 19.4. Church's Hypothesis 19.5 Recursive versus recursively enumerable sets 19.6. The universal Turing machine 19.7. The Halting Problem for Turing machines Exercises
CHAPTER 20: LINEAR BOUNDED AUTOMATA, CONTEXT SENSITIVE LANGUAGES AND TYPE 1 GRAMMARS 20.1. Linear bounded automata 20.1.1. Lba's and context sensitive grammars 20.2. Context sensitive languages and recursive sets 20.3.. Closure and decision properties Exercises
CHAPTER 21: LANGUAGES BETWEEN CONTEXT FREE AND CONTEXT SENSITIVE 21.1. Indexed grammars 21.2. Tree adjoining grammars 21.3. Head grammars 21.4. Categorial grammars
CHAPTER 22: TRANSFORMATIONAL GRAMMARS APPENDIX E-l: THE CHOMSKY HIERARCHY APPENDIX E-II: SEMANTIC AUTOMATA REVIEW EXERCISES SOLUTIONS TO SELECTED EXERCISES Part A Chapter 1 Chapter 2 Chapter 3 Chapter 4 Review Problems, Part A PartB. Chapter 6 Chapter 7 Chapter 8 Review Problems» Part В Part С Chapter 9 Chapter 10 Chapter 11 Chapter 12 Review Exercises, Part С PaitD. Chapter 13 Chapter 14 Chapter 15 PartE. Chapter 17 Chapter 18 Chapter 19 Chapter 20 Appendix E-II Review Problems, Part E BIBLIOGRAPHY INDEX
- Спойлер:
Mathematical Methods in Linguistics
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Пт Сен 25 2009, 20:05 | Пт Сен 25 2009, 20:05 | |
| Analyzing Linguistic Data: A Practical Introduction to Statistics Using R R. H. Baayen
Contents
- Спойлер:
Preface x 1 An introduction to R 1 1.1 R as a calculator 2 1.2 Getting data into and out of R 4 1.3 Accessing information in data frames 6 1.4 Operations on data frames 10 1.4.1 Sorting a data frame by one or more columns 10 1.4.2 Changing information in a data frame 12 1.4.3 Extracting contingency tables from data frames 13 1.4.4 Calculations on data frames 15 1.5 Session management 18
2 Graphical data exploration 20 2.1 Random variables 20 2.2 Visualizing single random variables 21 2.3 Visualizing two or more variables 32 2.4 Trellis graphics 37
3 Probability distributions 44 3.1 Distributions 44 3.2 Discrete distributions 44 3.3 Continuous distributions 57 3.3.1 The normal distribution 58 3.3.2 The t , F, and χ2 distributions 63
4 Basic statistical methods 68 4.1 Tests for single vectors 71 4.1.1 Distribution tests 71 4.1.2 Tests for the mean 75 4.2 Tests for two independent vectors 77 4.2.1 Are the distributions the same? 78 4.2.2 Are the means the same? 79 4.2.3 Are the variances the same? 81 4.3 Paired vectors 82 4.3.1 Are the means or medians the same? 82 4.3.2 Functional relations: linear regression 84 4.3.3 What does the joint density look like? 97 4.4 A numerical vector and a factor: analysis of variance 101 4.4.1 Two numerical vectors and a factor: analysis of covariance 108 4.5 Two vectors with counts 111 4.6 A note on statistical significance 114
5 Clustering and classification 118 5.1 Clustering 118 5.1.1 Tables with measurements: principal components analysis 118 5.1.2 Tables with measurements: factor analysis 126 5.1.3 Tables with counts: correspondence analysis 128 5.1.4 Tables with distances: multidimensional scaling 136 5.1.5 Tables with distances: hierarchical cluster analysis 138 5.2 Classification 148 5.2.1 Classification trees 148 5.2.2 Discriminant analysis 154 5.2.3 Support vector machines 160
6 Regression modeling 165 6.1 Introduction 165 6.2 Ordinary least squares regression 169 6.2.1 Nonlinearities 174 6.2.2 Collinearity 181 6.2.3 Model criticism 188 6.2.4 Validation 193 6.3 Generalized linear models 195 6.3.1 Logistic regression 195 6.3.2 Ordinal logistic regression 208 6.4 Regression with breakpoints 214 6.5 Models for lexical richness 222 6.6 General considerations 236
7 Mixed models 241 7.1 Modeling data with fixed and random effects 242 7.2 A comparison with traditional analyses 259 7.2.1 Mixed-effects models and quasi-F 260 7.2.2 Mixed-effects models and Latin Square designs 266 7.2.3 Regression with subjects and items 269 7.3 Shrinkage in mixed-effects models 275 7.4 Generalized linear mixed models 278 7.5 Case studies 284 7.5.1 Primed lexical decision latencies for Dutch neologisms 284 7.5.2 Self-paced reading latencies for Dutch neologisms 287 7.5.3 Visual lexical decision latencies of Dutch eight-year-olds 289 7.5.4 Mixed-effects models in corpus linguistics 295
Appendix A Solutions to the exercises 303 Appendix B Overview of R functions 335 References 342 Index 347 Index of data sets 347 Index of R 347 Index of topics 349 Index of authors 352
- Спойлер:
Analyzing Linguistic Data: A Practical Introduction to Statistics Using R
|
|
| | | bot Гость
Сообщений : 317
Репутация : 12
| bot | :: Пн Сен 28 2009, 20:32 | Пн Сен 28 2009, 20:32 | |
| Computational Linguistics: Models, Resources, Applications Igor A. Bolshakov, Alexander Gelbukh http://www.gelbukh.com
Abstract: Can computers meaningfully process human language? If this is difficult, why? If this is possible, how? This book introduces the reader to the fascinating science of computational linguistics and automatic natural language processing, which combines linguistics and artificial intelligence. The main part of the book is devoted to the explanation of the inner working of a linguistic processor, a software module in charge of translating natural language input into a representation directly usable by traditional artificial intelligence applications and, vice versa, of translating their answer into human language. Overall emphasis in the book is made on a well-elaborated, though—for a number of historical reasons—so far little-known in the literature computational linguistic model called Meaning-Text Theory. For comparison, other models and formalisms are considered in detail. The book is mainly oriented to researchers and students interested in applications of natural language processing techniques to Spanish language. In particular, most of the examples given in the book deal with Spanish language material—which is a feature of the book distinguishing it from other books on natural language processing. However, our main exposition is sufficiently general to be applicable to a wide range of languages. Specifically, it was taken into account that many readers of the book will be Spanish native speakers. For them, some comments on the English terminology, as well as a short English-Spanish dictionary of technical terms used in the book, were included. Still, reading the book in English will help Spanish-speaking readers to become familiar with the style and terminology used in the scientific literature on the subject. Contents
- Спойлер:
Preface A new book on computational linguistics Objectives and intended readers of the book Coordination with computer science Coordination with artificial intelligence Selection of topics Web resources for this book Acknowledgments
I. Introduction The role of natural language processing Linguistics and its structure What we mean by computational linguistics Word, what is it? The important role of the fundamental science Current state of applied research on Spanish Conclusions
II. A Historical Outline The structuralist approach Initial contribution of Chomsky A simple context-free grammar Transformational grammars The linguistic research after Chomsky: Valencies and interpretation Linguistic research after Chomsky: Constraints Head-driven Phrase Structure Grammar The idea of unification The Meaning Û Text Theory: Multistage transformer and government patterns The Meaning Û Text Theory: Dependency trees The Meaning Û Text Theory: Semantic links Conclusions
III. Products of Computational Linguistics: Present and Prospective Classification of applied linguistic systems Automatic hyphenation Spell checking Grammar checking Style checking References to words and word combinations Information retrieval Topical summarization Automatic translation Natural language interface Extraction of factual data from texts Text generation Systems of language understanding Related systems Conclusions
IV. Language as a Meaning Û Text Transformer Possible points of view on natural language Language as a bi-directional transformer Text, what is it? Meaning, what is it? Two ways to represent Meaning Decomposition and atomization of Meaning Not-uniqueness of Meaning Þ Text mapping: Synonymy Not-uniqueness of Text Þ Meaning mapping: homonymy More on homonymy Multistage character of the Meaning Û Text transformer Translation as a multistage transformation Two sides of a sign Linguistic sign Linguistic sign in the mmt Linguistic sign in hpsg Are signifiers given by nature or by convention? Generative, mtt, and constraint ideas in comparison Conclusions
V. Linguistic Models What is modeling in general? Neurolinguistic models Psycholinguistic models Functional models of language Research linguistic models Common features of modern models of language Specific features of the Meaning Û Text Model Reduced models Do we really need linguistic models? Analogy in natural languages Empirical versus rationalist approaches Limited scope of the modern linguistic theories Conclusions Exercises Review questions Problems recommended for exams
Literature Recommended literature Additional literature General grammars and dictionaries References Appendices Some Spanish-oriented groups and resources English-Spanish dictionary of terminology Index of illustrations Index of authors, systems, and terminology
- Спойлер:
Computational Linguistics: Models, Resources, Applications
Introduction to Arabic Natural Language Processing Nizar Y. Habash
Contents
- Спойлер:
Preface ......................................................................xv Acknowledgments ..........................................................xvii 1 What is “Arabic”? .............................................................1 1.1 Arabic Language and Arabic Dialects .........................................1 1.2 Arabic Script ...............................................................2 1.3 This Book .................................................................3 2 Arabic Script .................................................................5 2.1 Elements of the Arabic Script ................................................5 2.1.1 Letters ..............................................................5 2.1.2 Diacritics ...........................................................11 2.1.3 Digits ..............................................................12 2.1.4 Punctuation and Other Symbols ......................................14 2.1.5 Arabic Script Extensions .............................................14 2.1.6 Arabic Typography ..................................................15 2.2 Arabic Encoding, Input and Display .........................................16 2.2.1 Arabic Input/Output Support ........................................17 2.2.2 Arabic Encodings ...................................................18 2.3 NLP Tasks ................................................................20 2.3.1 Orthographic Transliteration .........................................20 2.3.2 Orthographic Normalization .........................................21 2.3.3 Handwriting Recognition ............................................23 2.3.4 Automatic Diacritization.............................................24 2.4 Further Readings ..........................................................24x 3 Arabic Phonology andOrthography ..........................................27 3.1 Arabic Phonology .........................................................27 3.1.1 Basic Concepts ......................................................27 3.1.2 A Sketch of Arabic Phonology .......................................28 3.1.3 Phonological Variations among Arabic Dialects and MSA...............30 3.2 Arabic Orthography .......................................................31 3.2.1 Optional Diacritics ..................................................31 3.2.2 Hamza Spelling .....................................................32 3.2.3 Morpho-phonemic Spelling ..........................................33 3.2.4 Standardization Issues ...............................................34 3.3 NLP Tasks ................................................................35 3.3.1 Proper Name Transliteration .........................................35 3.3.2 Spelling Correction ..................................................36 3.3.3 Speech Recognition and Synthesis ....................................37 3.4 Further Readings ..........................................................37 4 ArabicMorphology .........................................................39 4.1 Basic Concepts ............................................................39 4.1.1 Form-Based Morphology ............................................39 4.1.2 Functional Morphology ..............................................44 4.1.3 Form-Function Independence ........................................46 4.2 A Sketch of ArabicWord Morphology ......................................47 4.2.1 Cliticization Morphology ............................................47 4.2.2 Inflectional Morphology .............................................50 4.2.3 Derivational Morphology ............................................58 4.2.4 Morphophonemic and Orthographic Adjustments ......................59 4.3 Further Readings ..........................................................63 5 ComputationalMorphologyTasks ...........................................65 5.1 Basic Concepts ............................................................65 5.2 Morphological Analysis and Generation .....................................67xi 5.2.1 Dimensions of Variation .............................................68 5.2.2 Bama: Buckwalter Arabic Morphological Analyzer .....................70 5.2.3 Almorgeana: Arabic Lexeme-based Morphological Generation andAnalysis........................................................71 5.2.4 Magead:Morphological Analysis and Generation for Arabic and its Dialects .........................................................72 5.2.5 ElixirFM: Elixir Arabic Functional Morphology ......................75 5.3 Tokenization ..............................................................76 5.3.1 Tokenization Schemes and Techniques ................................76 5.3.2 Detokenization .....................................................77 5.3.3 Various Tokenization Schemes ........................................77 5.4 POS Tagging .............................................................79 5.4.1 The Buckwalter Tag Set ..............................................80 5.4.2 Reduced Buckwalter Tag Sets: Bies, Kulick, ERTS .....................80 5.4.3 The CATiB POS Tag Set ............................................83 5.4.4 The Khoja Tag Set ..................................................84 5.4.5 The PADT Tag Set ..................................................84 5.5 Two Tool Suites ...........................................................86 5.5.1 MADA+TOKAN...................................................86 5.5.2 AMIRA............................................................89 5.5.3 Comparing Mada+Tokan with Amira ..............................91 6 Arabic Syntax ...............................................................93 6.1 A Sketch of Arabic Syntactic Structures .....................................93 6.1.1 A Note on Morphology and Syntax ...................................93 6.1.2 Sentence Structure ..................................................93 6.1.3 Nominal Phrase Structure ............................................99 6.1.4 Prepositional Phrases ...............................................104 6.2 Arabic Treebanks .........................................................104 6.2.1 The Penn Arabic Treebank ..........................................105 6.2.2 The Prague Arabic Dependency Treebank ............................106xii 6.2.3 Columbia Arabic Treebank ..........................................108 6.2.4 Comparison: PATB, PADT and CATiB..............................108 6.2.5 A Forest of Treebanks ..............................................111 6.3 Syntactic Applications ....................................................112 6.4 Further Readings .........................................................112 7 ANote on Arabic Semantics ................................................113 7.1 A Brief Note on Terminology ..............................................113 7.2 Arabic PropBank .........................................................114 7.3 ArabicWordNet .........................................................115 7.4 Arabic Resources for Information Extraction ................................116 7.5 Further Readings .........................................................117 8 ANote on Arabic andMachineTranslation ..................................119 8.1 Basic Concepts of Machine Translation .....................................119 8.2 A Multilingual Comparison ...............................................120 8.2.1 Orthography .......................................................120 8.2.2 Morphology .......................................................121 8.2.3 Syntax ............................................................122 8.3 State of the Field of Arabic MT ...........................................123 8.4 Further Readings .........................................................124 A ArabicNLP Repositories andNetworking Resources .........................125 A.1 Repositories .............................................................125 A.1.1 Resource Distributors ..............................................125 A.1.2 Research Paper Repositories .........................................125 A.1.3 Collections of Links ................................................125 A.2 Networking and Conferences ..............................................126 A.2.1 Professional Networks ..............................................126 A.2.2 Conferences andWorkshops ........................................126xiii B ArabicNLP Books and References ..........................................129 B.1 Linguistics ...............................................................129 B.2 Paper/Scanned Dictionaries ...............................................130 B.3 Computational Linguistics ................................................130 B.4 Tutorials and Lectures ....................................................131 C ArabicNLP Corpora and Lexica ............................................133 C.1 Speech Corpora ..........................................................133 C.2 Arabic Handwriting Recognition Corpora and Evaluations ...................134 C.3 TextCorpora ............................................................134 C.3.1 Monolingual Text ..................................................134 C.3.2 Parallel Text .......................................................135 C.3.3 POS Tagged and/or Diacritized Text .................................136 C.3.4 Annotations for Information Extraction and Retrieval .................136 C.3.5 Treebanks .........................................................136 C.4 Evaluation Corpora .......................................................137 C.5 Lexical Databases ........................................................137 C.5.1 Monolingual Dictionaries ...........................................137 C.5.2 Multilingual Dictionaries ...........................................137 C.5.3 Morphological Lexica ..............................................138 C.5.4 Root lists ..........................................................138 C.5.5 Phonetic Databases ................................................138 C.5.6 Gazetteers .........................................................138 C.5.7 Semantic Ontologies ...............................................139 D ArabicNLPTools ..........................................................141 D.1 Stemming ...............................................................141 D.2 Morphological Analysis and Generation ....................................141 D.3 Morphological Disambiguation and POS Tagging ...........................141 D.4 Parsers ..................................................................142 D.5 Typsetting ...............................................................142xiv CONTENTS D.6 Named Entity Recognition ................................................142 D.7 Tree Editing .............................................................142 D.8 Lexicography ............................................................142 D.9 Text Entry ...............................................................142 D.10 Machine Translation ......................................................142 E Important ArabicNLP Acronyms ...........................................143 Bibliography ...............................................................147 Author’s Biography .........................................................167
- Спойлер:
Introduction to Arabic Natural Language Processing
Semantic Domains in Computational Linguistics Carlo Strapparava Alfio Gliozzo
Contents
- Спойлер:
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Lexical Semantics and Text Understanding . . . . . . . . . . . . . . . . . 3 1.2 Semantic Domains: Computational Models for Lexical Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Semantic Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.2 Domain Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3 Semantic Domains in Text Categorization . . . . . . . . . . . . 8 1.3.4 Semantic Domains in Word Sense Disambiguation . . . . . 9 1.3.5 Multilingual Domain Models . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.6 Kernel Methods for Natural Language Processing. . . . . . 12 2 Semantic Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 The Theory of Semantic Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Semantic Fields and the meaning-is-use View . . . . . . . . . . . . . . . 18 2.3 Semantic Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 The Domain Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 WordNet Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 Lexical Coherence: A Bridge from the Lexicon to the Texts . . . 25 2.7 Computational Models for Semantic Domains . . . . . . . . . . . . . . . 29 3 Domain Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 Domain Models: De?nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 The Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 The Domain Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 WordNet-Based Domain Models . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 Corpus-Based Acquisition of Domain Models . . . . . . . . . . . . . . . . 40 3.6 Latent Semantic Analysis for Term Clustering . . . . . . . . . . . . . . . 41 3.7 The Domain Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.7.1 Domain Features in Supervised Learning . . . . . . . . . . . . . 44 3.7.2 The Domain Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Semantic Domains in Text Categorization . . . . . . . . . . . . . . . . . 49 4.1 Domain Kernels for Text Categorization . . . . . . . . . . . . . . . . . . . . 49 4.1.1 Semi-supervised Learning in Text Categorization . . . . . . 50 4.1.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Intensional Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.1 Intensional Learning for Text Categorization . . . . . . . . . . 56 4.2.2 Domain Models and the Gaussian Mixture Algorithm for Intensional Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5 Semantic Domains in Word Sense Disambiguation . . . . . . . . . 69 5.1 The Word Sense Disambiguation Task . . . . . . . . . . . . . . . . . . . . . 70 5.2 The Knowledge Acquisition Bottleneck in Supervised WSD . . . 73 5.3 Semantic Domains in the WSD Literature . . . . . . . . . . . . . . . . . . 74 5.4 Domain-Driven Disambiguation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.1 Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.5 Domain Kernels for WSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.5.1 The Domain Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.2 Syntagmatic Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.5.3 WSD Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6 Multilingual Domain Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.1 Multilingual Domain Models: De?nition . . . . . . . . . . . . . . . . . . . . 90 6.2 Comparable Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3 Cross-language Text Categorization . . . . . . . . . . . . . . . . . . . . . . . . 92 6.4 The Multilingual Vector Space Model . . . . . . . . . . . . . . . . . . . . . . 93 6.5 The Multilingual Domain Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.6 Automatic Acquisition of Multilingual Domain Models . . . . . . . 96 6.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.7.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.7.2 Monolingual Text Categorization Results . . . . . . . . . . . . . 99 6.7.3 Cross-language Text Categorization Results . . . . . . . . . . . 99 6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7 Conclusion and Perspectives for Future Research . . . . . . . . . . 101 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.2.1 Consolidation of the Present Work . . . . . . . . . . . . . . . . . . . 103 7.2.2 Domain-Driven Technologies . . . . . . . . . . . . . . . . . . . . . . . . 104 7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A Appendix: Kernel Methods for NLP . . . . . . . . . . . . . . . . . . . . . . . 107 A.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 A.2 Feature-Based vs. Instance-Based Learning . . . . . . . . . . . . . . . . . 110 A.3 Linear Classi?ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 A.4 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.5 Kernel Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.6 Kernels for Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
- Спойлер:
Semantic Domains in Computational Linguistics
The Handbook of Computational Linguistics and Natural Language Processing Alexander Clark, Chris Fox, and Shalom Lappin
Contents
- Спойлер:
List of Figures ix List of Tables xiv Notes on Contributors xv Preface xxiii Introduction 1 Part I Formal Foundations 9 1 Formal Language Theory 11 SHULY WINTNER 2 Computational Complexity in Natural Language 43 IAN PRATT-HARTMANN 3 Statistical Language Modeling 74 CIPRIAN CHELBA 4 Theory of Parsing 105 MARK-JAN NEDERHOF AND GIORGIO SATTA
Part II Current Methods 131 5 Maximum Entropy Models 133 ROBERT MALOUF 6 Memory-Based Learning 154 WALTER DAELEMANS AND ANTAL VAN DEN BOSCH 7 Decision Trees 180 HELMUT SCHMID 8 Unsupervised Learning and Grammar Induction 197 ALEXANDER CLARK AND SHALOM LAPPIN 9 Artificial Neural Networks 221 JAMES B. HENDERSON 10 Linguistic Annotation 238 MARTHA PALMER AND NIANWEN XUE 11 Evaluation of NLP Systems 271 PHILIP RESNIK AND JIMMY LIN
Part III Domains of Application 297 12 Speech Recognition 299 STEVE RENALS AND THOMAS HAIN 13 Statistical Parsing 333 STEPHEN CLARK 14 Segmentation and Morphology 364 JOHN A. GOLDSMITH 15 Computational Semantics 394 CHRIS FOX 16 Computational Models of Dialogue 429 JONATHAN GINZBURG AND RAQUEL FERNÁNDEZ 17 Computational Psycholinguistics 482 MATTHEW W. CROCKER Part IV Applications 515 18 Information Extraction 517 RALPH GRISHMAN 19 Machine Translation 531 ANDY WAY 20 Natural Language Generation 574 EHUD REITER 21 Discourse Processing 599 RUSLAN MITKOV 22 Question Answering 630 BONNIE WEBBER AND NICK WEBB References 655 Author Index 742 Subject Index 763
- Спойлер:
The Handbook of Computational Linguistics and Natural Language Processing
|
|
| | | | Компьютерная лингвистика |
---|
| |
|