Fact or fiction: Content classification for digital libraries

Finn, A., Kushmerick, N., & Smyth, B. (2001). Fact or fiction: Content classification for digital libraries. Joint DELOS-NSF Workshop on Personalisation and Recommender Systems in
Digital Libraries
(Dublin). postscript, pdf

Abstract
The World-Wide Web (WWW) is a vast repository of information, much of which is valuable but very often hidden to the user. The anarchic nature of the WWW presents unique challenges when it comes to information extraction and categorization. We view the WWW as a valuable resource for the gathering of information for Digital Libraries. In this paper we will describe the process of extracting and classifying information from the WWW for the purpose of integrating it into digital libraries. Our efforts focus on ways to automatically classify news articles according to whether they present opinions or reported facts. We describe and evaluate a system in development that automatically classifies and recommends Web news articles from sports and politics domains.