000103638 001__ 103638 000103638 005__ 20241021130520.0 000103638 02470 $$ahttps://doi.org/10.35111/77ba-9x74$$2DOI 000103638 02470 $$aLDC2008T19$$2Other 000103638 037__ $$aACQUIRED 000103638 041__ $$aeng 000103638 245__ $$aThe New York Times Annotated Corpus 000103638 260__ $$bLinguistic Data Consortium, Portions © 1987-2008 New York Times, © 2008 Trustees of the University of Pennsylvania 000103638 269__ $$a2008-10-17 000103638 336__ $$aDataset 000103638 506__ $$aAs of February 2024, The New York Times Annotated Corpus LDC2008T19 has been withdrawn from the Linguistic Data Consortium Catalog by the data provider. 000103638 518__ $$d1987-01-01/2007-06-19$$oCreated 000103638 520__ $$a<i>The New York Times</i> Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at nytimes.com. The corpus includes: Over 1.8 million articles (excluding wire services articles that appeared during the covered period). Over 650,000 article summaries written by library scientists. Over 1,500,000 articles manually tagged by library scientists with tags drawn from a normalized indexing vocabulary of people, organizations, locations and topic descriptors. Over 275,000 algorithmically-tagged articles that have been hand verified by the online production staff at nytimes.com. Java tools for parsing corpus documents from .xml into a memory resident object.$$7Abstract 000103638 540__ $$aThe New York Times Annotated Corpus Agreement$$uhttps://catalog.ldc.upenn.edu/license/the-new-york-times-annotated-corpus-ldc2008t19.pdf 000103638 650__ $$aMedia and communications 000103638 650__ $$aOther humanities 000103638 6531_ $$alinguistics 000103638 6531_ $$acorpora 000103638 6531_ $$aNew York Times 000103638 655__ $$aText 000103638 7001_ $$aSandhaus, Evan$$7Personal 000103638 909CO $$ooai:data.library.wustl.edu:103638$$pdataset 000103638 913__ $$aNo 000103638 980__ $$aLicensed Datasets