Natural Language Annotation For Machine Learning

Thể loại: AI ;Công Nghệ Thông Tin
  • Lượt đọc : 349
  • Kích thước : 5.64 MB
  • Số trang : 343
  • Đăng lúc : 2 năm trước
  • Số lượt tải : 155
  • Số lượt xem : 1.406
  • Đọc trên điện thoại :
This book is intended as a resource for people who are interested in using computers to help process natural language. A natural language refers to any language spoken by humans, either currently (e.g., English, Chinese, Spanish) or in the past (e.g., Latin, ancient Greek, Sanskrit). Annotation refers to the process of adding metadata information to the text in order to augment a computer’s capability to perform Natural Language Processing (NLP). In particular, we examine how information can be added to natural language text through annotation in order to increase the performance of machine learning algorithms—computer programs designed to extrapolate rules from the infor mation provided over texts in order to apply those rules to unannotated texts later on.

Natural Language Annotation for Machine Learning

This book details the multistage process for building your own annotated natural language dataset (known as a corpus) in order to train machine learning (ML) algorithms for language-based data and knowledge discovery. The overall goal of this book is to show readers how to create their own corpus, starting with selecting an annotation task, creating the annotation specification, designing the guidelines, creating a “gold standard” corpus, and then beginning the actual data creation with the annotation process.

Because the annotation process is not linear, multiple iterations can be required for defining the tasks, annotations, and evaluations, in order to achieve the best results for a particular goal. The process can be summed up in terms of the MATTER Annotation Development Process: Model, Annotate, Train, Test, Evaluate, Revise. This book guides the reader through the cycle, and provides detailed examples and discussion for different types of annotation tasks throughout. These tasks are examined in depth to provide context for readers and to help provide a foundation for their own ML goals.

Additionally, this book provides access to and usage guidelines for lightweight, userfriendly software that can be used for annotating texts and adjudicating the annotations. While a variety of annotation tools are available to the community, the Multipurpose Annotation Environment (MAE) adopted in this book (and available to readers as a free download) was specifically designed to be easy to set up and get running, so that con fusing documentation would not distract readers from their goals. MAE is paired with the Multidocument Adjudication Interface (MAI), a tool that allows for quick comparison of annotated documents.